RIVM-bioinformatics / ViroConstrictor

ViroConstrictor is a pipeline designed to process raw FastQ data from viral amplicon-based sequencing and generate biologically correct consensus sequences of the given viral genome
https://rivm-bioinformatics.github.io/ViroConstrictor/
GNU Affero General Public License v3.0
5 stars 2 forks source link

ViroConstrictor multi-sequence analysis running error #96

Closed Chrisxie03 closed 7 months ago

Chrisxie03 commented 7 months ago

Hi everyone, I have installed ViroContrictor version 1.4.1 via Conda. I am running a multi-sequence analysis, with the following command:

ViroConstrictor -i 'data/' -o 'output_VC1/' -samples 'samplesheet.tsv' --platform 'nanopore' -at 'end-to-end'

when running the command I get the following text in my terminal:


[08/04/24 16:36:19] INFO ViroConstrictor version: 1.4.1
[08/04/24 16:36:19] INFO Succesfully read global configuration file
[08/04/24 16:36:19] INFO Valid FastQ files were found in the input directory. ('data/')
[08/04/24 16:36:19] INFO Successfully parsed all command line arguments
[08/04/24 16:36:19] WARNING 2 Ambiguous nucleotides found in file /mnt/studentfiles/2024/2024MBI08/viroconstrictor/influenza_reference.fasta in record A-HA-H1-NC_026433: R
Please check whether this is intended.
[08/04/24 16:36:19] WARNING 1 Ambiguous nucleotides found in file /mnt/studentfiles/2024/2024MBI08/viroconstrictor/influenza_reference.fasta in record A-PB1-PB1-NC_007375: N
Please check whether this is intended.
[08/04/24 16:36:19] WARNING 1 Ambiguous nucleotides found in file /mnt/studentfiles/2024/2024MBI08/viroconstrictor/influenza_reference.fasta in record A-NA-N9-NC_026429: Y
Please check whether this is intended.
[08/04/24 16:36:19] WARNING 1 Ambiguous nucleotides found in file /mnt/studentfiles/2024/2024MBI08/viroconstrictor/influenza_reference.fasta in record A-NP-NP-NC_026436: R
Please check whether this is intended.
[08/04/24 16:36:19] WARNING 1 Ambiguous nucleotides found in file /mnt/studentfiles/2024/2024MBI08/viroconstrictor/influenza_reference.fasta in record A-PA-PA-NC_026437: R
Please check whether this is intended.

Traceback (most recent call last): File "/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/bin/ViroConstrictor", line 10, in sys.exit(main()) ^^^^^^ File "/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/lib/python3.11/site-packages/ViroConstrictor/main.py", line 144, in main update(sys.argv, parsed_input.user_config) File "/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/lib/python3.11/site-packages/ViroConstrictor/update.py", line 87, in update ask_prompt = conf["GENERAL"]["ask_for_update"] == "yes"


  File "/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/lib/python3.11/configparser.py", line 1273, in __getitem__
    raise KeyError(key)
KeyError: 'ask_for_update'
___
After seeing the 'ask_for_update' error, I added the --skip-updates flag to my command, this resulted in ViroConstrictor starting the Match-reference process, but this resulted in the next error.
___
[08/04/24 16:10:13] ERROR    IndexError in file     /mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/lib/python3.11/site-packages/ViroConstrictor/workflow/match_ref.smk, line 74:
list index out of range                                                                                                                                                                                          
File "/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/lib/python3.11/site-packages/ViroConstrictor/workflow/match_ref.smk", line 91, in <module>                                              
File "/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/lib/python3.11/site-packages/ViroConstrictor/workflow/match_ref.smk", line 74, in segmented_ref_groups                                  
File "/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/lib/python3.11/site-packages/ViroConstrictor/workflow/match_ref.smk", line 74, in <setcomp>

/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/bin/ViroConstrictor:10: DeprecationWarning: The parameter "ln" is deprecated since v2.5.2. Instead of ln=1 use new_x=XPos.LMARGIN, new_y=YPos.NEXT.
  sys.exit(main())
___

I tried to look into the match_ref.smk file but I could not figure out how the index error occurred.
I have also looked at the Deprecation warning but could not find the parameter "ln" in the code.

If anyone knows whether i have to use the --skip-updates flag and how to resolve the last two error I would be very thankful!

kind regards,

Chris
florianzwagemaker commented 7 months ago

Hi @Chrisxie03

Thank you for your message and providing the log messages. Sorry to see that you're experiencing issues with ViroConstrictor. In your message/logs i'm reading two independent issues. One is a bug (the auto-updater) and the other is what i believe to be mostly a documentation issue.

First of all, the auto-updating error you're seeing is a bug, so thank you for bringing this to our attention. The program exits because it believes it's missing necessary configuration setting, this isn't right however and this specific configuration setting shouldn't be required.

It can be worked around pretty easily though. To fix it, please delete any existing configuration(s) that you have with rm ~/.ViroConstrictor_*. Then run the pipeline as normal, you'll be prompted with the first-time setup questions again. For the auto-updating prompt please answer with "no", and subsequently for the asking-to-update prompt answer this one with "yes".

From this point onwards ViroConstrictor will not completely automatically update to every new version. However you'll get a yes/no prompt whether or not you want to update to the newest available version. With this change you should no longer experience the issue. You can of course still run the pipeline with the --skip-updates flag anyways, but if you do then i'd recommend you to at least update to the currently latest version (1.4.2) manually beforehand as this version has some bugfixes that should help with the rest of the analysis. This can be done with mamba update viroconstrictor or mamba install viroconstrictor==1.4.2

I'll see if i can fix the updater-functionality for the next upcoming release.


The second error you're experiencing is unrelated, and i believe it has to do with formatting of inputs and missing proper documentation for this. In this case specifically the references.

From the log messages i see you're trying to analyze Influenza A data with the match-reference process enabled, and i'm going to assume also with the --segmented flag/mode set to True for these samples.

I quickly want to clarify what the various modes mean specifically.

  1. If the match-reference and segmented modes are both disabled then a multi-reference analysis is still possible. ViroConstrictor will simply run the entire analysis for every possible reference. This happens on a per-sample basis.

    Using this setting works if your (multi-)reference fasta looks as follows:

    >A-HA-H1-NC_026433
    atgaaagtaaaactactggtcctg...
    >A-MP-MP-NC_026433
    atgagtcttctaaccgaggtcgaa...
    >A-NA-N9-NC_026429
    atgaacccaaatcaaaagataata...
    >A-NP-NP-NC_026436
    atgagtgacatcgaagccatggcg...
    >A-NS-NS-NC_007375
    atggattcccacactgtgtcaagc...
  2. if the match-reference mode is enabled and segmented mode is disabled then ViroConstrictor will pick one reference that fits the provided data the best. This happens on a per sample basis. i.e. if a multi-reference file is provided for a sample that contains 100 fasta references, then ViroConstricotor will try out all these 100 references for this sample and pick one reference that fits best for this specific sample.

    This setting is to be used when using (for example) a sequencing protocol that is ambiguous for various subtypes of the same non-segmented viral-target. Please see below for an example of the multi-reference fasta file.

    >measles_subtype1
    atgaaagtaaaactactggtcctg...
    >measles_subtype2
    atgagtcttctaaccgaggtcgaa...
    >measles_subtype3
    atgaacccaaatcaaaagataata...
    >measles_subtype4
    atgagtgacatcgaagccatggcg...
  3. If the match-reference and segmented modes are both enabled then ViroConstrictor will search for the best fitting reference of each segment of a virus. This again happens on a per-sample basis but requires some specific formatting for the multi-reference fasta.
    In this case, ViroConstrictor will choose one best-fitting reference per segment per sample.

    To make this work, it is necessary that the multi-reference fasta with all the possible references for all the segments is formatted like follows:

    >A.HA_01 HA|H1|H1N1
    atgaaagtaaaactactggtcc...
    >A.HA_02 HA|H3|H3N2
    atgaagactatcattgctttga...
    >A.HA_03 HA|H5|H5N1
    atgaagactatcattgctttga...
    
    >A.MP_01 MP|MP|H1N1
    atgagtcttctaaccgaggtcg...
    >A.MP_02 MP|MP|H3N2
    atgagccttcttaccgaggtcg...
    >A.MP_03 MP|MP|H5N1
    atgagtcttctaaccgaggtcg...
    
    >A.NA_01 NA|N1|H1N1
    atgaacccaaatcaaaagataa...
    >A.NA_02 NA|N2|H3N2
    atgaatccaaatcaaaagataa...
    >A.NA_03 NA|N1|H5N1
    atgaatccaaatcaaaagataa...

    The formatting comes down to the following structure:
    Personal identifier Segment-name|Segment-subtype|Extra-information

    In the final analysis, the Segment-name and your personal identifier get swapped so the folders with results won't get messy.

For your specific case, i think the formatting of the reference file was not really fitting into these setups.

If you want to analyze influenza data and the multi-reference only has one option per segment then it's fine to leave the match-reference and segmented modes disabled.

If you do have multiple reference options per segment then it's necessary to format the fasta headers as in the example above.

This is also where the splitting-error currently comes from. Because match-reference and segmented are both enabled, ViroConstrictor expects the reference-fasta to be formatted as in the example above. This isn't the case however and therefore it exits with an error.


The following last message you see can be ignored:

/mnt/studentfiles/2024/2024MBI08/mambaforge/envs/viroconstrictor/bin/ViroConstrictor:10: DeprecationWarning: The parameter "ln" is deprecated since v2.5.2. Instead of ln=1 use new_x=XPos.LMARGIN, new_y=YPos.NEXT.
sys.exit(main())

This is merely a deprecation message of something that will be replaced/fixed in a next version and this has no impact on the analysis.

I hope i was able to clear some things up for you. If you have any other questions please let us know.

Kind regards, Florian

Chrisxie03 commented 7 months ago

Hi @florianzwagemaker,

Thank you for the comment, ViroConstrictor is now running!!

Kind regards,

Chris