MikkelSchubert / paleomix

Pipelines and tools for the processing of ancient and modern HTS data.
https://paleomix.readthedocs.io/en/stable/
MIT License
43 stars 19 forks source link

Receieved a NodeError while running the pipeline #53

Open farahwar opened 9 months ago

farahwar commented 9 months ago

Hello! I ran into an error as following while trying to running my sample.

38 INFO Validating FASTA files
16:44:38 INFO Building BAM pipeline for 'N_chinensis-novo.yaml'
16:44:38 INFO Running BAM pipeline
16:44:38 INFO Checking file dependencies
16:44:38 INFO Checking for auxiliary files
16:44:38 INFO Checking required software
16:44:38 INFO  - Found Rscript v3.4.4
16:44:38 INFO  - Found AdapterRemoval v2.3.1
16:44:38 INFO  - Found BWA v0.7.17
16:44:38 INFO  - Found Picard tools v2.23
16:44:39 INFO  - Found R module: Rcpp v0.12.15
16:44:39 INFO  - Found R module: RcppGSL v0.3.3
16:44:39 INFO  - Found R module: gam v1.14.4
16:44:39 INFO  - Found R module: ggplot2 v2.2.1
16:44:39 INFO  - Found R module: inline v0.3.14
16:44:39 INFO  - Found mapDamage v2.2.1
16:44:39 INFO  - Found samtools v1.10.0
16:44:39 INFO Determining states
16:44:39 INFO Ready
16:44:39 INFO [1/22] Started trimming SE adapters from '/media/birg/Disk_2/student/farah/paleomix2/data/O1_interleaved.fastq.gz'
16:44:39 INFO [2/22] Started validating '/media/birg/Disk_2/student/farah/paleomix2/prefixes/N_chinensis_novo.fasta'
16:44:39 ERROR NodeError while validating '/media/birg/Disk_2/student/farah/paleomix2/prefixes/N_chinensis_novo.fasta':
16:44:39 INFO Saving error logs to '/media/birg/Disk_2/student/farah/paleomix2/stats/bam_pipeline.20231227_164438_01.log'
16:44:39 ERROR     Error(s) running Node:
16:44:39 ERROR        Temporary directory: '/media/birg/Disk_2/student/farah/paleomix2/stats/temp/044fc85f-e3fb-4604-9661-7d6922492089'
16:44:39 ERROR     
16:44:39 ERROR     FASTA sequence contains invalid characters
16:44:39 ERROR         Filename = '/media/birg/Disk_2/student/farah/paleomix2/prefixes/N_chinensis_novo.fasta'
16:44:39 ERROR         Line = 106
16:44:39 ERROR         Invalid characters = '*'
16:48:35 INFO [0/0] Finished trimming SE adapters from '/media/birg/Disk_2/student/farah/paleomix2/data/O1_interleaved.fastq.gz'

What could this possibly mean and what can I do to avoid this error? My guess was that the fasta file I'm has some '*' to it. Thank you in advance!

MikkelSchubert commented 9 months ago

Hi Fawa,

You are right that your FASTA file appears to contains the character *.

This is not supported by the pipeline, since t is not supported by many other tools, so if you want to use that FASTA file then you will need to convert non-standard bases to Ns.

Best, Mikkel