fritzsedlazeck / Spectre

Copy number caller for long read data including SNV utilization
MIT License
54 stars 3 forks source link

KeyError "MT" #31

Closed sandratra-rab closed 3 months ago

sandratra-rab commented 3 months ago

Hi,

I was testing Spectre latest release using mosdepth output and by submitting the following commandline:

CNVCaller --coverage test_Q20.regions.bed.gz --sample-id test --output-dir CNV_Calling/spectre/test/ --reference Ensembl/GRCh38.109/homo_sapiens.GRCh38.109.fasta --only-chr 1,10,11,12,13,14,15,16,17,18,19,2,20,21,22,3,4,5,6,7,8,9,MT,X,Y

I knew it should output PNG pictures for every chromosomes requested with the --only-chr parameter (even when no particular long CNV would be found). However when reaching "MT" chromosome turn, i got this error: KeyError: 'MT'

Traceback (most recent call last): File "devtest/envs/conda/d63adcd6ce3856b3a13f63b0d5a81f6a/bin/spectre", line 8, in sys.exit(main()) File "devtest/envs/conda/d63adcd6ce3856b3a13f63b0d5a81f6a/lib/python3.10/site-packages/spectre/spectre.py", line 642, in main spectre_run.spectre_exe() File "devtest/envs/conda/d63adcd6ce3856b3a13f63b0d5a81f6a/lib/python3.10/site-packages/spectre/spectre.py", line 356, in spectre_exe spectre_main.cnv_call() File "devtest/envs/conda/d63adcd6ce3856b3a13f63b0d5a81f6a/lib/python3.10/site-packages/spectre/spectreCNV.py", line 88, in cnv_call self.cnv_analysis.data_normalization() File "devtest/envs/conda/d63adcd6ce3856b3a13f63b0d5a81f6a/lib/python3.10/site-packages/spectre/analysis/analysis.py", line 187, in data_normalization self.coverage = tmp_genome_wide_coverage_dict[reference_chromosome] KeyError: 'MT'

While generating the MDR file from the Ensembl GRCh38 r109 fasta file, Spectre returned this specific line regarding the MT chromosome (it worked perfectly fine for the previous chromosome numbers):

spectre::2024-08-22 20:05:41,319::INFO::spectre.util.metadata.metadataCollector> Reading >MT dna:chromosome chromosome:GRCh38:MT:1:16569:1 REF spectre::2024-08-22 20:05:41,326::INFO::spectre.util.metadata.metadataCollector> length: 16569 []

Is it normal to get this error for MT chromosome specifically ? I have read the issues #2 and #29 but MT chromosome is indeed in my reference fasta file with the correct name written for the --only-chrparameter and MT sequence does not seem to contain any N regions.

When running the same commandline without the --only-chr option, I get no error but no PNG file for MT chromosome is generated neither (same for various alt chromosomes inside my reference fasta file by the way, only chr 1-22, X, Y and only one alternative chromosome got a PNG output and I don't understand how it got selected).

Thank you in advance for your help!

philippesanio commented 3 months ago

Hi @sandratra-rab

Thanks for raising that question.

Is it normal to get this error for MT chromosome specifically ? I have read the issues https://github.com/fritzsedlazeck/Spectre/issues/2 and https://github.com/fritzsedlazeck/Spectre/issues/29 but MT chromosome is indeed in my reference fasta file with the correct name written for the --only-chr parameter and MT sequence does not seem to contain any N regions.

No, this should not be specific to MT per see. According to your description, I am guessing that there is no chromosome MT in the sample coverage data. Could you be so kind and confirmation for me if there is a chromosome MT in Mosdepth coverage data? You can check that by either looking at the in the Mosdepth sample summary file or searching in the sample coverage file regions.bed.gz.

Also, could you be so kind and check which version of Spectre you are running? spectre version. If you want the latest development version, please clone the Github repo and install Spectre locally using pip.

When running the same commandline without the --only-chr option, I get no error but no PNG file for MT chromosome is generated neither (same for various alt chromosomes inside my reference fasta file by the way, only chr 1-22, X, Y and only one alternative chromosome got a PNG output and I don't understand how it got selected).

Spectre generates only a coverage plot for every chromosome that is available for analysis.

Just a side note, please make sure to use the blacklist as it masks problematic regions, especially around the telomeres and centromeres. It is based on the gap data from UCSC grch38_blacklist.bed. We recommend using the grch38_blacklist_spectre.bed has been modified by expanding the masked regions, since a lot of samples we have tested showed poor coverage signals in those regions. In your case, you might have to change the chromosome names in the blacklist from eg "chr1" to "1". Not using the blacklist might increase your FPs dramatically, especially in the difficult regions. If you need different gap data, feel free to modify them.

Also, if you want to run Spectre on different samples with the same reference, make sure to tell Spectre with --metadata where it can find the MDR file. Otherwise, Spectre will calculate the same MDR file for every sample you are using again and again, which will cost a lot of time.

Best, Philippe

sandratra-rab commented 3 months ago

Thanks for your quick answer and your recommandations! I will launch it again using the grch38 blacklist file and keep the MDR file for later use.

I am using Spectre 0.2.1 version (installed via conda environment + pip). Here are the mosdepth output for MT chromosome:

In mosdepth regions.bed.gz file:

MT 0 1000 262.03 MT 1000 2000 130.77 MT 2000 3000 267.63 MT 3000 4000 158.50 MT 4000 5000 143.08 MT 5000 6000 227.38 MT 6000 7000 168.80 MT 7000 8000 197.55 MT 8000 9000 228.50 MT 9000 10000 236.10 MT 10000 11000 233.10 MT 11000 12000 349.48 MT 12000 13000 252.00 MT 13000 14000 210.88 MT 14000 15000 227.65 MT 15000 16000 202.20 MT 16000 16569 578.95

In mosdepth summary file :

chrom length bases mean min max MT 16569 3825050 230.86 50 800 MT_region 16569 3825050 230.86 50 800

Just in case, I have checked but no invisible special character was introduced between "MT" and the tab separator.

philippesanio commented 3 months ago

Hi @sandratra-rab

The coverage file seems good to me. However, I think you will run into the limitation, that Spectre does not report any CNVs under 10kb even if you push the minimum CNV length below that.

I am using Spectre 0.2.1 version (installed via conda environment + pip). Here are the mosdepth output for MT chromosome:

I think you might want to install the latest version of Spectre from GitHub. The latest version Spectre version is 0.2.1-patch-july15, which addresses a lot of bug fixes. However, they will be pushed to pip with the next updated, which is coming soon.

Installing Spectre locally with pip:

pip install build git clone https://github.com/fritzsedlazeck/Spectre.git cd ./Spectre python3 -m build pip install dist/spectre_cnv-.tar.gz # replace with e.g. 0.2.0

Feel free to ping me anytime if you encounter any other issues.

Best, Philippe