Niema-Lab / ViralWasm-Epi

A client-side WebAssembly pipeline for viral molecular epidemiology
https://niema-lab.github.io/ViralWasm-Epi/
GNU General Public License v3.0
2 stars 0 forks source link

Double check if adding aliases will cause duplicate entries in dropdown #18

Closed niemasd closed 3 months ago

niemasd commented 8 months ago

Hey, @daniel-ji! I just updated ViralMSA's preloaded reference genomes to support aliases (e.g. "Monkeypox" being an alias for "Mpox", and "SC2" + "COVID-19" being aliases for "SARS-CoV-2"). In the ViralMSA web app, that caused the reference genome selection dropdown to have duplicates, which I was able to patch in the following commit:

https://github.com/niemasd/ViralMSA/commit/7ea6b642f33149d2d6753c70e1649fd17249bf31

I tried rerunning the "deploy website" GitHub Action in the ViralWasm-Epi repo, and it didn't cause duplicate entries, but the relevant part of the ViralWasm-Epi source code looks like it should have the same issue. Can you double check and see if you need to copy my ViralMSA fix above into the ViralWasm-Epi code?

niemasd commented 8 months ago

Actually, I noticed my addition of the Measles reference genome didn't update here, so I imagine I didn't actually push a rebuild of the web app. I just pushed a new commit with this fix migrated here:

https://github.com/Niema-Lab/ViralWasm-Epi/commit/620639a2e1909b51ffdda8457224133c6dc74480

I'll wait until the web app rebuilds and see if the patch worked

daniel-ji commented 8 months ago

Hi Niema,

Is everything working now? You might have to ctrl-shift-r to get the latest deployment sometimes. Thanks!

daniel-ji commented 8 months ago

Hi @niemasd I just realized that yes I likely have to copy the ViralMSA code for it to update (so that we can keep new releases separate), ViralWasm-Epi is likely now using a non-latest version of ViralMSA. I'm out right now, but I can check in ~40min if that works. Thank you!

Would you want ViralWasm-Epi to auto update to latest in the future?

niemasd commented 8 months ago

No rush at all 😄 Yeah, auto update to the newest ViralMSA is a good idea!

daniel-ji commented 8 months ago

Will do, thanks!

daniel-ji commented 8 months ago

Closed in commit 3ef4e1b.

niemasd commented 8 months ago

@daniel-ji I wonder if it makes sense to have a GitHub Action that compares public/data/ref_genomes in this repo vs. the ViralMSA repo:

https://github.com/niemasd/ViralMSA/tree/master/ref_genomes

and copies over any new genomes? Or maybe something like this:

git clone https://github.com/niemasd/ViralMSA.git
rm -rf public/data/ref_genomes
cp -r ViralMSA/ref_genomes public/data/ref_genomes
git add -u public/data/ref_genomes
git commit -m "Updated references" && git push

and then have this GitHub Action run every e.g. hour/day? Thoughts?

daniel-ji commented 8 months ago

Hi @niemasd,

I actually already have a Github Action (https://github.com/Niema-Lab/ViralWasm-Epi/actions/workflows/update-viralmsa.yml) to update ViralMSA 😂. I guess both of us forgot about it, but I'll just go ahead and add a trigger to run on a schedule, done in commit dfbc9fa. I'll keep an eye out to make sure it runs. I only have it run twice a week right now because it might commit an updated zip file for offline download every time even if there's no changes, so I'll test it out and fix accordingly. Thanks!

niemasd commented 8 months ago

Ah, perfect, thanks for making that update! Looks great!

niemasd commented 8 months ago

Hey, @daniel-ji! I just realized that ViralWasm-Consensus also went out-of-date, so maybe it would make the most sense for me to create a separate GitHub repo that just contains all of the preprocessed reference genomes, maybe with a JSON file storing the name mapping, and then all 3 tools include that other repo as a submodule? That way, I can push reference genomes to that single repo, and all 3 tools can pull directly from there (and just update the git submodule upon updates)

I'll create such a repo and try making the git submodule. Then, I'll follow up to have you try to load the reference genome name mappings from the JSON that will be in that directory (rather than hardcoding it into the web app source code)

daniel-ji commented 8 months ago

Hi @niemasd,

Apologies about not covering that. That all sounds good, FYI we have a cron job (https://github.com/Niema-Lab/ViralWasm-Epi/actions/workflows/update-viralmsa.yml) that runs the script (https://github.com/Niema-Lab/ViralWasm-Epi/blob/master/scripts/update-viralmsa.sh) which auto updates ViralWasm-Epi and I could also just add something similar to ViralWasm-Consensus.

Reference genome name mappings are also not hard coded into web app source code in the sense that I pull the repository structure from the ViralMSA repo via https://api.github.com/repos/niemasd/ViralMSA/git/trees/master?recursive=1 as a JSON file and then I fetch from a ref_genomes folder (which is also automatically updated alongside the repository structure JSON in the cron job script). I can clarify more in an email / meeting if you'd like. Thanks!

daniel-ji commented 8 months ago

To follow up @niemasd,

I believe that https://github.com/Niema-Lab/Reference-Genomes as is right now (with just reference genomes) is great for me - it has both the sequences themselves and the names for the select dropdowns. I can create a git submodule within ViralWasm-Epi and ViralWasm-Consensus (would you also want me to do this for ViralMSA and ViralConsensus?). An alternative to the git submodule is what I'm doing now - having a shell script that is run by a Github workflow on a schedule to download from the Reference-Genomes repo and update the tool (that way users can also manually trigger it, if that's of any use?).

To get the file/folder structure of the Reference-Genomes repo, what we could do is create a workflow that runs every commit / push to the repo and automatically creates the repo structure as a JSON file and adds it to the repo OR we could also just pull from https://api.github.com/repos/Niema-Lab/Reference-Genomes/git/trees/main?recursive=1 (which is what I've been doing now). Thoughts?

niemasd commented 8 months ago

Hey, Daniel! I think having a git submodule is the best solution. I'll handle doing it in ViralMSA (just the CLI), and I'll follow up on this GitHub Issue with a link to the changes I made to make that happen, and you can decide how you want to do that in the ViralMSA web app as well as in ViralWasm. I'll update you ASAP

daniel-ji commented 8 months ago

Sounds good, no rush, thanks!

niemasd commented 8 months ago

Hey, Daniel! I just pushed the changes. My "Reference-Genomes" repo has a GitHub Action that automatically creates a JSON file, so that JSON file can be used to get the "human name to reference genome sequence ID" mapping:

https://github.com/Niema-Lab/Reference-Genomes/releases/latest/download/REFS.json

This is how I do it in the updated ViralMSA code:

https://github.com/niemasd/ViralMSA/compare/1.1.42...1.1.43#diff-4a15aa7308ebc2145fff2ff7112a61ae8713d10ff423a19d5fcc6c8b7aefc1f8R66-R71

I did it like this in the ViralMSA Python code to ensure that ViralMSA can be downloaded and installed as a single standalone Python script. But in the web apps, given that you'll have access to the entire repo's filesystem, you should just generate the mapping of "human readable name to reference genome" by making the Reference-Genomes repo a git submodule to replace the ref_genomes folder in each repo, and then just load the human-readable name of reference genome ID/ID.fas from the file ID/ID.name.txt

Can you make this change in all of the web apps? And can you create a CRON GitHub Action that updates the git submodule every e.g. hour?

niemasd commented 8 months ago

Quick update: rather than a CRON GitHub Action, it might be best to just use Dependabot. I made this one for the ViralMSA repo:

https://github.com/niemasd/ViralMSA/blob/master/.github/dependabot.yml

which was based on this post:

https://stackoverflow.com/a/73229032/2134991

Daily checks should be sufficient

niemasd commented 8 months ago

Actually, rather than a git submodule, would it make sense to just git clone the Reference-Genomes repo within whatever GitHub Action you're using to build the website, and then have that GitHub Action run periodically (e.g. every day)?

daniel-ji commented 8 months ago

Sure, will do. I'll take a look in the next couple days if that's okay. Thanks!

niemasd commented 8 months ago

Sounds good, no rush! Thanks!

daniel-ji commented 8 months ago

Hi @niemasd,

I've got it working on my end now, except one thing - fetching the REFS.json file is non-trivial since when I try to request for the file from the JS runtime via the web, I get a CORS error:

"GitHub, like many other web services, implements CORS to enhance web security by restricting how resources on its domains can be requested from scripts running on other domains. This means that if you try to fetch a resource hosted on GitHub from a different domain using JavaScript in the browser, you will likely encounter a CORS error unless GitHub has explicitly allowed such cross-origin requests for that resource."

To fix this, it would quite tedious to configure release asset downloads (I would have to setup a Github API to make a request for the asset file, use an auth-token to download the asset, possibly be rate-limited, and then add that auth-token to the Github Actions deployment of the site via Github) and an easier fix would just to also upload REFS.json to the Reference-Genomes (in addition to making it a release asset) every time there is a new commit / push. I could update the Reference-Genomes compile.yml file with a PR if you're okay with this. Would that be possible? Thank you!

niemasd commented 8 months ago

@daniel-ji Hmm, what about instead of using the REFS.json file at all, in the GitHub Action folder that builds and deploys the website, what if we just add a step after the environment is set up but before the website is built that clones the Reference-Genomes repo and runs the Python script in the repo that builds the JSON file, and then use that local file when building the website?

daniel-ji commented 8 months ago

Done!

niemasd commented 8 months ago

Thanks! I was actually able to patch it to not need the git submodule at all: I just clone the Reference-Genomes repo from scratch. It seems to have worked (I just tested it with the submodule entirely removed):

https://github.com/niemasd/ViralMSA/compare/93af14f...172b312

We should probably take this same approach for the ViralWasm tools

niemasd commented 8 months ago

@daniel-ji Yeah, it seems like ViralWasm-Epi is broken because of my updates to ViralMSA (but ViralWasm-Consensus seems to work)

EDIT: ViralWasm-Consensus will probably break upon next push, so yeah, would be good to update both ViralWasm-Epi and ViralWasm-Consensus to load the reference genome list in the exact same way that ViralMSA is now doing it

daniel-ji commented 8 months ago

@niemasd ViralWasm-Epi has been fixed, I'll work on ViralWasm-Consensus soon!

niemasd commented 8 months ago

Awesome, thank you!

niemasd commented 3 months ago

Forgot to close this GitHub Issue, but this seems to be resolved now