bedapub / HBVouroboros

HBVouroboros automates sequencing-based HBV genotyping and expression profiling
GNU General Public License v3.0
2 stars 1 forks source link

Mapped read files are empty #43

Open giuliafrrn opened 1 year ago

giuliafrrn commented 1 year ago

I tried to run the pipeline with a dataset, but had several problems.

Firts I ran the pipeline without adjusting the config, except for the sample annotation file. After that I tried to set doPerSamp to True. With both configurations these files are empty and the pipeline fails. (See error message below)

# running normalization on reads: $VAR1 = [
          [
            '/gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz'
          ],
          [
            '/gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz'
          ]
        ];

Tuesday, May 23, 2023: 16:10:39 CMD: /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl --seqType fq --JM 10G  --max_cov 200 --min_cov 1 --CPU 1 --output /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/perSamp_trinity/02_Sample/trinity/insilico_read_normalization --max_CV 10000  --left /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz --right /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz --pairs_together  --PARALLEL_STATS
-prepping seqs
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -A -R 1  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz) >> left.fa
CMD: seqtk-trinity seq -A -R 2  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz) >> right.fa
Error, no records were correctly parsed from /dev/fd/63Thread 1 terminated abnormally: Error, cmd: seqtk-trinity seq -A -R 1  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz) >> left.fa died with ret 1280 at /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl line 793.
Error, no records were correctly parsed from /dev/fd/63Thread 2 terminated abnormally: Error, cmd: seqtk-trinity seq -A -R 2  <(gunzip -c /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz) >> right.fa died with ret 1280 at /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl line 793.
Error, conversion thread failed at /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl line 336.
Error, cmd: /gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt/trinity-2.12.0/util/insilico_read_normalization.pl --seqType fq --JM 10G  --max_cov 200 --min_cov 1 --CPU 1 --output /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/perSamp_trinity/02_Sample/trinity/insilico_read_normalization --max_CV 10000  --left /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_1.fq.gz --right /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/results/02_Sample_mapped_reads_2.fq.gz --pairs_together  --PARALLEL_STATS   died with ret 7424 at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 2869.
        main::process_cmd("/gpfs/scratchfs01/site/u/ferraing/conda/envs/HBVouroboros/opt"...) called at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 3422
        main::normalize("/gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FD"..., 200, ARRAY(0x55f6c69af7b0), ARRAY(0x55f6c69af7f8)) called at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 3362
        main::run_normalization(200, ARRAY(0x55f6c69af7b0), ARRAY(0x55f6c69af7f8)) called at /home/ferraing/scratch/conda/envs/HBVouroboros/bin/Trinity line 1384
[Tue May 23 16:10:39 2023]
Error in rule run_trinity_perSamp:
    jobid: 125
    input: results/02_Sample_mapped_reads_1.fq.gz, results/02_Sample_mapped_reads_2.fq.gz
    output: results/perSamp_trinity/02_Sample/trinity/Trinity.fasta

I also tried to set doInputRef and doPerSamp to True, but then the pipeline couldn't start at all.

MissingInputException in rule get_ref_strain_gb_inpt in file /gpfs/scratchfs01/site/u/ferraing/projects/2023-05-HBV-SNV-FDA-PS-13785/test/HBVouroboros/workflow/rules/align_reads.smk, line 335:
Missing input files for rule get_ref_strain_gb_inpt:
    output: results/inpt/inpt_strain.gb
    affected files:
        AB064313

As a reference I used the sampleAnnotation file under the .test folder and with this file the pipeline always worked.

Accio commented 1 year ago

Thanks @giuliafrrn for reporting the issue. I found out that the error appears when there are NO reads mapped to the HBV genome. To solve the issue, we will add checks whether the BAM files are empty; if so, we will terminate the workflow gracefully.