DennisSchmitz / Jovian_archive

Metagenomics/viromics pipeline that focuses on automation, user-friendliness and a clear audit trail. Jovian aims to empower classical biologists and wet-lab personnel to do metagenomics/viromics analyses themselves, without bioinformatics expertise.
GNU Affero General Public License v3.0
18 stars 7 forks source link

The typingtools (NoV, EV, HAV) rules give an error (proxy) and crash. Explanation. #29

Closed DennisSchmitz closed 4 years ago

DennisSchmitz commented 5 years ago

Please see the updated post below, in newer versions of the pipeline you should no longer get the error as described in the first post

There is a bug in the Viral_typing part of the pipeline that causes Jovian to stop prematurely and not finish it's analysis. This results in the Jovian Report to not show some tables and rendering improperly.

The error message:

RuleException:

CalledProcessError in line 576 of /PATH/Snakefile:

Command 'source /mnt/miniconda/bin/activate '/PATH/.snakemake/conda/ccb14a27'; set -euo pipefail;  awk -F "\t" '$6 == "Norwalk virus" {print ">" $2 "\n" $24}' < data/tables/NAME_taxClassified.tsv 2> logs/Viral_typing_NAME.log 1> data/virus_typing_tables/NAME_NoV.fa

if [ -s "data/virus_typing_tables/NAME.fa" ]

then

    curl -s --data-urlencode fasta-sequence@data/virus_typing_tables/NAME_NoV.fa https://www.rivm.nl/mpf/typingservice/norovirus 2>> logs/Viral_typing_NAME.log 1> data/virus_typing_tables/NAME_NoV.xml

    python bin/typingtool_NoV_XML_to_csv_parser.py NAME data/virus_typing_tables/NAME_NoV.xml data/virus_typing_tables/NAME_NoV.csv 2>> logs/Viral_typing_NAME.log

else

    echo -e "No scaffolds with species == Norwalk Virus in sample:      NAME." >> logs/Viral_typing_Undetermined_S0.log

    touch data/virus_typing_tables/NAME.xml

    touch data/virus_typing_tables/NAME.csv

fi

awk -F "\t" '$8 == "Picornaviridae" {print ">" $2 "\n" $24}' < data/tables/NAME.tsv 2>> logs/Viral_typing_NAME.log 1> data/virus_typing_tables/NAME_EV.fa

if [ -s "data/virus_typing_tables/NAME_EV.fa" ]

then

    curl -s --data-urlencode fasta-sequence@data/virus_typing_tables/NAME_EV.fa https://www.rivm.nl/mpf/typingservice/enterovirus 2>> logs/Viral_typing_NAME.log 1> data/virus_typing_tables/NAME_EV.xml

    python bin/typingtool_EV_XML_to_csv_parser.py NAME data/virus_typing_tables/NAME_EV.xml data/virus_typing_tables/NAME_EV.csv 2>> logs/Viral_typing_NAME.log

else

    echo -e "No scaffolds with family == Picornaviridae in sample:       NAME." >> logs/Viral_typing_NAME.log

    touch data/virus_typing_tables/NAME_EV.xml

    touch data/virus_typing_tables/NAME_EV.csv

fi' returned non-zero exit status 1.

  File "/PATH/Snakefile", line 576, in __rule_Viral_typing

  File "/PATH/.conda/envs/Jovian_master/lib/python3.6/concurrent/futures/thread.py", line 56, in run

Removing output files of failed job Viral_typing since they might be corrupted:

data/virus_typing_tables/NAME_NoV.fa, data/virus_typing_tables/NAME_EV.fa, data/virus_typing_tables/NAME_NoV.xml, data/virus_typing_tables/NAME_EV.xml, data/virus_typing_tables/NAME_NoV.csv

The reason:

We are currently using the publicly available Norovirus, Hepatititis A and Enterovirus web-based typingtools of Kroneman et al. 2011 hosted by the RIVM. These typingtools were originally intended for Sanger sequences.

We've found they cannot keep up with the amount of queries being sent by the pipeline, especially not now more people are using Jovian. A consequence of this increased popularity is that the typingtool web-server becomes overloaded and crashes, which in turn results in the Jovian error shown above.

We are aware of this problem, however, it is not trivial to solve and will take some time. In the meantime we are working on a short-term work-around (described below). For now, should you encounter this problem, please do the following troubleshooting and let us know the results:

Troubleshooting:

Check if these the typing tools are available by clicking here. Either this website is available, or you`ll get a time-out or 404 error.

If you get a time-out or 404 error this means the web-service has crashed. Please contact the developers either via a GitHub issue or via mail and we will reboot the server ASAP.

If this link does work it means the web-services are available and this is most likely caused by a sporadic connection problem. Please try to run the pipeline again in ~5 minutes. If again it doesn't work, it is not the same issue. So please make a separate GitHub issue and describe what you did and what error message you received.

Short term solution/work-around:

The Viral_typing rule will be removed from Jovian in v0.9.2 and be included as a separate, on-demand, script that can be started from within the report. Rationale being that not everyone wants the NoV, HAV and EV typing results anyway and thereby reducing the load on the web-services. Those that do want the results can activate it on-demand via the Jovian report. There the same problem can also occur, but at least the rest of the pipeline is then unaffected, will finish correctly and the Jovian Report is rendered properly.

Long term solution:

  1. We are working on the efficiency of the typing tool services, also, we're improving the queuing capabilities and improving the hardware. This should stop the web-services from crashing.
  2. Even farther in the future we hope to be able to package the typing software within the pipeline, but this is currently not possible.
DennisSchmitz commented 5 years ago

Workaround is implemented in version V0.9.2 (1ac0e94).

The typingtool processes are now removed from the main Jovian analysis and now can be performed by doing: bash jovian --virus-typing [NoV|EV|HAV|EV] N.B. this only works if a Jovian analysis has been performed already, otherwise you`ll get an error.

We kindly ask you to use these services sparingly lest the servers become overloaded and crash. These services are intended for clinical and public health applications that require sub-species level taxonomic classification, e.g. for outbreak tracing with accurate metadata.

We are working on long-term fixes that would allow automated virus typing, however, we have no ETA for it yet.

DennisSchmitz commented 5 years ago

Linked to #51, small update.

Apparently someone used a bot to continuously send queries... Hence all the proxy errors and overloading. That should be fixed now.

In the meanwhile, an improved version of the typingtools is online and being tested. Once testing is successful, it will replace the old version and then it should be much faster and more stable.

DennisSchmitz commented 5 years ago

Just an update: the new typing tool web-services are being tested and hopefully will come online after the holiday season.