MikkelSchubert / paleomix

Pipelines and tools for the processing of ancient and modern HTS data.
https://paleomix.readthedocs.io/en/stable/
MIT License
43 stars 19 forks source link

About the MinQuality setting #40

Closed ztang040 closed 1 year ago

ztang040 commented 3 years ago

Hi Mikkel,

I am doing a set of stringency trails, to calculate the number of hits under different "MinQuality" settings using paleomix pipeline. The makefile document and paleomix scripts were working properly when the "MinQuality" setting was below 60. However, once the "MinQuality" was beyond 60, an Error would exist when writing the "summary" file. The error sentance below:

14:29:06 INFO [3/6] Started writing summary to ./Sample_1.summary

14:29:06 ERROR NodeUnhandledException while writing summary to ./Sample_1.summary:

14:29:06 INFO Saving error logs to '/mnt/shared/scratch/ztang/project/324_Fix/quality_test/MinQ_60-100/MinQ_70/bam_pipeline.20210330_110345_01.log'

14:29:06 ERROR Error(s) running Node:

This issue occurs in all trails with "MinQuality" 60+, and the size of output bam file was also abnormal. Could you please give an idea of what was the potential reason for this error? Thanks!

MikkelSchubert commented 3 years ago

Hi,

What version of paleomix are you using? You can determine this by running paleomix --version.

I am unfortunately not able to see the actual error from the snippet you posted, so can you try attaching the full log file to this issue. You can find it at /mnt/shared/scratch/ztang/project/324_Fix/quality_test/MinQ_60-100/MinQ_70/bam_pipeline.20210330_110345_01.log

Also, can you explain what you mean when you say that the size of the output BAM was "abnormal"?

Cheers, Mikkel

ztang040 commented 3 years ago

Hi Mikkel,

Thank you for your reply. My version of paleomix was paleomix v1.3.2

Yeah, if you wish, I could provide more information for tackling this issue.

The log file below: `2021-03-31 13:19:27,466 paleomix.pipeline ERROR NodeUnhandledException while writing summary to ./Sample_1.summary: 2021-03-31 13:19:27,471 paleomix.pipeline ERROR Error(s) running Node: 2021-03-31 13:19:27,472 paleomix.pipeline ERROR Temporary directory: '/mnt/shared/scratch/ztang/project/324_Fix/quality_test/MinQ_60-100/MinQ_65/temp/f8aca52b-65d9-4cd6-920e-f0c947d5f7b4' 2021-03-31 13:19:27,472 paleomix.pipeline ERROR
2021-03-31 13:19:27,473 paleomix.pipeline ERROR Traceback (most recent call last): 2021-03-31 13:19:27,474 paleomix.pipeline ERROR File "/home/ztang/miniconda3/envs/paleomix/lib/python3.6/site-packages/paleomix/node.py", line 109, in run 2021-03-31 13:19:27,475 paleomix.pipeline ERROR self._run(config, temp) 2021-03-31 13:19:27,476 paleomix.pipeline ERROR File "/home/ztang/miniconda3/envs/paleomix/lib/python3.6/site-packages/paleomix/pipelines/bam/parts/summary.py", line 117, in _run 2021-03-31 13:19:27,476 paleomix.pipeline ERROR self._write_tables(table, genomes) 2021-03-31 13:19:27,477 paleomix.pipeline ERROR File "/home/ztang/miniconda3/envs/paleomix/lib/python3.6/site-packages/paleomix/pipelines/bam/parts/summary.py", line 156, in _write_tables 2021-03-31 13:19:27,478 paleomix.pipeline ERROR for row in self._build_tables(genomes): 2021-03-31 13:19:27,479 paleomix.pipeline ERROR File "/home/ztang/miniconda3/envs/paleomix/lib/python3.6/site-packages/paleomix/pipelines/bam/parts/summary.py", line 163, in _build_tables 2021-03-31 13:19:27,480 paleomix.pipeline ERROR self._read_tables(self._prefixes, genomes).items() 2021-03-31 13:19:27,480 paleomix.pipeline ERROR File "/home/ztang/miniconda3/envs/paleomix/lib/python3.6/site-packages/paleomix/pipelines/bam/parts/summary.py", line 212, in _read_tables 2021-03-31 13:19:27,481 paleomix.pipeline ERROR libraries[library] = self._annotate_subtables(subtables, genomes) 2021-03-31 13:19:27,482 paleomix.pipeline ERROR File "/home/ztang/miniconda3/envs/paleomix/lib/python3.6/site-packages/paleomix/pipelines/bam/parts/summary.py", line 258, in _annotate_subtables 2021-03-31 13:19:27,483 paleomix.pipeline ERROR total_hits or "NaN" 2021-03-31 13:19:27,484 paleomix.pipeline ERROR TypeError: unsupported operand type(s) for /: 'int' and 'str'

Meanwhile, the size of the abnormal BAM was

421K -rw-rw-r-- 1 ztang ztang 449K Mar 31 13:19 Sample_1.conch_baits.bam 9.5K -rw-rw-r-- 1 ztang ztang 166K Mar 31 13:19 Sample_1.conch_baits.bam.bai

In contrast, the size of BAM file from those succeed trails were:

543M -rw-rw-r-- 1 ztang ztang 590M Mar 30 02:11 Sample_1.conch_baits.bam 550K -rw-rw-r-- 1 ztang ztang 1.7M Mar 30 02:11 Sample_1.conch_baits.bam.bai

If there is anything more I could provide for analyse, please ask me for them!

Cheers, Zitian

MikkelSchubert commented 3 years ago

Hi Zitian,

Thank you very much! I've released a new version of paleomix (v1.3.3) that should fix this bug.

But it is not terribly surprising that your BAM files are that small: Depending on the mapper and settings used, few alignments will have a mapping quality of 60 or more, so you are filtering out all or most of your hits. That was also what caused the problem you ran into, which was due to a sample or library having 0 hits.

Cheers, Mikkel

ztang040 commented 3 years ago

Hi Mikkel,

Thank you very much for the new version of paleomix! I would install it quickly and run my new attempts on it.

Yeah, more and more hits would be filtered out as the value of MinQ increases. But as far as I concerned, the number of hits produced from MinQ_10 to MinQ_60 was decreased gradually in my previous trail (drop uniformly from 8204205 to 5681172). Therefore, it would be unlikely to go from 5 millions alignment hits with MinQ_60 to 0 hits with MinQ_70.

Maybe the updated version would give me the answer for this, I would come back to report the progress after my new attempt.

ztang040 commented 3 years ago

Cheers, Zitian

ztang040 commented 3 years ago

Hi Mikkel,

You were right! The updated paleomix had functioned well and there is actually no hits for all MinQ beyond 65. As a result, the outcome from 60 is what I need. Appreciate again for your new version!

Best wishes, Zitian