Closed ammaraziz closed 8 months ago
Nextclade developer here. Let me know folks if you need help in the process of upgrade to v3 (in this case, feel free to either mention my nickname loudly or submit issues).
--output-errors is now --output-csv, the output is different that previous output
The format is the same, it's just that --output-csv
and --output-tsv
contain all possible columns, including the ones related to errors and warnings, and --output-errors
only contains the ones that are related to errors and warnings. So if your CSV/TSV processing does not depend on number of columns and their order then the --output-tsv
/--output-csv
is just a drop-in replacement.
tag.json, qc.json and virus_properties.json got merged into a single file, called pathogen.json
If official datasets are used, then this probably does not matter. As far as I understand, the request was to keep official v2 datasets up-to-date. However I also see that there are copies of v2 datasets stored in this repo. I don't know if they are exact copies or modified in some way or used outside of running nextclade with them. I am not familiar with this project, but let me know if dataset format change in v3 is significant here.
primers.csv was removed.
We were asked to bring back the primers.csv
feature. The input format was so bad, we had no idea anyone is actually using it. The comeback hasn't happened yet, but planned in the very near future (~a couple of days to couple of weeks).
If official datasets are used, then this probably does not matter. As far as I understand, https://github.com/nextstrain/nextclade/issues/1397 was to keep official v2 datasets up-to-date. However I also see that there are copies of v2 datasets stored in this repo. I don't know if they are exact copies or modified in some way or used outside of running nextclade with them. I am not familiar with this project, but let me know if dataset format change in v3 is significant here.
Users (i.e. me!) can specify a flag that retrieves the latest version of the SC2 dataset. So as far as I know, they are identical. The only issue is the lack of V2 dataset updates.
Good to know the majority of changes only affect the parsing in a minor way.
ONT folks, if you create a new docker image of the nextclade v3 I can do the testing and submit a pull request to the workflow-glue cli tool.
Hi Both,
Thanks for highlighting this and offering your help in the matter. We will put it on our development roadmap and hopefully have a fix soon!
@cjalder Let us know if you have any questions migrating from 2->3, I'm another dev of Nextclade.
The first time there'd be only a v3 dataset release, i.e. the start of V2 being inferior would be in around 2 weeks.
New lineages wouldn't appear - but everything keeps working otherwise. Of course one wants to transition to v3, this is just to allow a severity estimate.
There's a way you could extend your runway by downloading and overwriting just the tree from v3 and otherwise continue with v2.
This could be as simple as adding the following to the dataset download code:
if $USE_NEXTCLADE_V3_TREE; do
NEXTCLADE_V3_RELEASE_TIMESTAMP="2024-01-16--20-31-02Z" ;
curl https://data.clades.nextstrain.org/v3/nextstrain/sars-cov-2/wuhan-hu-1/orfs/$NEXTCLADE_V3_RELEASE_TIMESTAMP/tree.json > dataset_path/tree.json ;
done
to override the v2 tree with a specified v3 tree.
Nextclade v2 can use that v3 tree without issues.
This could be added (after nextflowification from my bash above) here:
You can see how minimal the required changes are to keep using the v2 binary but keep getting dataset updates in this PR.
I haven't tested the nextflow, but the bash works locally for me: #109
Thanks all
We're updating to nextclade v3 in the next release, we'll try to get this out as soon as we can.
Matt
@mattdmem Thank you for the V3 upgrade!
Operating System
Ubuntu 22.04
Other Linux
No response
Workflow Version
All
Workflow Execution
Command line
EPI2ME Version
No response
CLI command run
No response
Workflow Execution - CLI Execution Profile
None
What happened?
Nextclade has been updated to V3.0.0, this update includes a change to the datasets. The v2.X datasets are in archive mode and all updates will be pushed to V3. The current setup for this pipeline is to use V2.14.0.
There are a few other changes that are worth mentioning:
--output-errors
is now--output-csv
, the output is different that previous outputtag.json
,qc.json
andvirus_properties.json
got merged into a single file, calledpathogen.json
primers.csv
was removed.See full details of changes here: https://github.com/nextstrain/nextclade_data/blob/master/docs/migration-guide-v3.md
Without updates to the
workflow-glue report
and the ONT docker images, all lineage calls will be out dated.Relevant log output
Application activity log entry
No response