not understandable Error message about primers

irinatu commented 2 years ago

Hello, I run analysis for three zones, so I have 3 pairs of primers in the file "prim", but I obtained following message:

Started ShapeMapper v2.1.5 at 2022-06-23 02:38:19 Output will be logged to test_Long_shapemapper_log.txt Running from directory: /home/irina/SHapeMap/ShapeMap_2022_06_13 args: --name test_Long --target file.fasta --amplicon --primers prim --overwrite --min-depth 1000 --modified --folder plus --untreated --folder minus --denatured --folder den --preserve-order --correct-seq --folder minus --output-processed-reads --output-aligned-reads --output-parsed-mutations --output-counted-mutations Warning: no random primer length was specified, but at least one RNA is longer than a typical directed-primer amplicon. Use --random-primer-len to exclude mutations within primer binding regions. Created pipeline at 2022-06-23 02:38:19 Running PrimerLocator at 2022-06-23 02:38:19 . . . . . . done at 2022-06-23 02:38:19 Running FastaFormatChecker at 2022-06-23 02:38:19 . . . . . . done at 2022-06-23 02:38:19 Running BowtieIndexBuilder at 2022-06-23 02:38:19 . . . . . . done at 2022-06-23 02:38:20 Running process group 4 at 2022-06-23 02:38:20 . . . Including these components: Appender1 . . . started at 2022-06-23 02:38:20 Appender2 . . . started at 2022-06-23 02:38:20 ProgressMonitor . . . started at 2022-06-23 02:38:20 QualityTrimmer1 . . . started at 2022-06-23 02:38:20 QualityTrimmer2 . . . started at 2022-06-23 02:38:20 Interleaver . . . started at 2022-06-23 02:38:20 Merger . . . started at 2022-06-23 02:38:20 Tab6Interleaver . . . started at 2022-06-23 02:38:20 BowtieAligner . . . started at 2022-06-23 02:38:20 MutationParser . . .Traceback (most recent call last): File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/component.py", line 350, in format_command formatted = command.format(**values) KeyError: 'primers'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/irina/bin/shapemapper-2.1.5/internals/python/cli.py", line 141, in run(sys.argv) File "/home/irina/bin/shapemapper-2.1.5/internals/python/cli.py", line 70, in run success = pipeline.run() File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/pipeline.py", line 717, in run component.start_process(verbose=self.verbose) File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/component.py", line 666, in start_process formatted_cmd = self.format_command(self.cmd()) File "/home/irina/bin/shapemapper-2.1.5/internals/python/pyshapemap/component.py", line 356, in format_command raise KeyError(msg) KeyError: "Error: for component MutationParser, 'primers' node not linked to a filename or parameter, or that node name does not exist."

Could you explain what that could mean, please? Best regards, Irina.

Psirving commented 2 years ago

This looks like a bug. I cannot get --amplicon to work with --correct-seq. However, I don't think the sequence correction component needs to know the primer sequences. I'll look into this further. For now I think you can get around this by making two separate calls to shapemapper. Let me know if this solves your issue.

Only perform sequence correction.

shapemapper --name test_Long--target file.fasta --out correct-sequence --correct-seq --folder minus

Locate the corrected fasta file and run the full shapemapper pipeline.

shapemapper --name test_Long --target {path to new fasta file} --amplicon --primers prim --overwrite --min-depth 1000 --modified --folder plus --untreated --folder minus --denatured --folder den --preserve-order --output-processed-reads --output-aligned-reads --output-parsed-mutations --output-counted-mutations

irinatu commented 2 years ago

Yes, Thank you very much. This definitely solved the problem. It seems --correct-seq doesn't change my reference sequence, but excluding --correct-seq from the shapemapper command line helps. I thought that adding --correct-seq to the command line would let me run the correction of the sequence first, and then the pipeline would automatically run on the corrected sequence. Maybe there is also a tutorial, explaining when to consider the right options, available?

Psirving commented 2 years ago

I'm glad this worked for you. You are correct about the expected behavior when using --correct-seq. However, there was a bug in this component of ShapeMapper. It will be fixed in the next version.

There's no tutorial for all of the options except what is found in the documentation. Take a look at the usage examples in the README. There's also a lot of information in /docs.

Here's a brief explanation of some of your flags in case you are unsure:

--preserve-order --output-processed-reads and --output-aligned-reads are useful for in-depth debugging. You probably don't need these.
--output-parsed-mutations produces a parsed.mut file required for RingMapper, PairMapper, and DanceMapper.
--output-counted-mutations produces a table with mutation rates broken down by mutation type, e.g. A->C, deletion, multinucleotide deletion, etc. This is mostly for technology development purposes or if you are looking for evidence of a specific RNA modification such as A->I.

I often use --per-read-histograms for quality control when using RingMapper, PairMapper, or DanceMapper. It will produce a table in the log.txt file containing the read length distribution and the mutations per molecule distribution. For these analyses, Mutations per molecule should be high (>=5), and reads should be long.

irinatu commented 2 years ago

Thanks a lot for your help and for explaining the options I used. Actually, I used--preserve-order --output-processed-reads and --output-aligned-reads to be able to parse the bam / sam file in case we had any doubts.

Weeks-UNC / shapemapper2

not understandable Error message about primers #31