andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
100 stars 29 forks source link

clarification about the result #221

Closed Petrichor-sudo closed 3 weeks ago

Petrichor-sudo commented 3 months ago

Hi, I have some questions regarding the result and hope to get clarified.

The following is the result by freyja variants /my_path/my_bam.bam --variants /my_path/test_var.tsv --depths /my_path/test_depth.depth --ref /my_path/allfasta.fsa --refname original freyja demix /my_path/test_var.tsv /my_path/test_depth.depth --output /my_path/test_output.tsv

summarized [('Other', 0.9436806231356629), ('Delta', 0.041632839854993144)] lineages B.5 B B.55 B.23 B.1.14 B.1.617.2 B.1.617 B.1.384 B.1.78 B.1.91 B.1.139 B.1 B.1.67 B.1.76 B.1.177.85 B.1.97 B.1.1.349 B.1.528 B.1.1.127 B.1.1.169 B.1.177.3 B.1.151 B.51 B.1.146 B.1.1.185 abundances 0.15966555 0.15966555 0.15966555 0.15966555 0.15966555 0.04163284 0.02295991 0.01366805 0.01366805 0.01366805 0.01366805 0.01366805 0.01366805 0.01366805 0.00537634 0.00454545 0.00265252 0.00224972 0.00215517 0.00198610 0.00178891 0.00173611 0.00167224 0.00147059 0.00108342 resid 2.7290875605839875 coverage 42.7214660736381

Where allfasta.fsa is the reference genome file I used for aligning the bam file, which contains six reference genomes of type "original, delta, alpha, beta, gamma, omicron".

My questions are:

  1. Since freyja variants can only accept only one reference genome so I add the refname specification, but only "original" works, other types such as "delta" will raise error when demix:
    
    Traceback (most recent call last):
    File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
    File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
    File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 2606, in pandas._libs.hashtable.Int64HashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 2630, in pandas._libs.hashtable.Int64HashTable.get_item
    KeyError: 29791

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/data/covid19_summer/mambaforge/bin/freyja", line 10, in sys.exit(cli()) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/freyja/cli.py", line 138, in demix mix, depths, cov = build_mix_and_depth_arrays(variants, depths, muts, File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/freyja/sample_deconv.py", line 77, in build_mix_and_depth_arrays depths = pd.Series({kI: df_depth.loc[int(re.findall(r'\d+', kI)[0]), 3] File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/freyja/sample_deconv.py", line 77, in depths = pd.Series({kI: df_depth.loc[int(re.findall(r'\d+', kI)[0]), 3] File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/pandas/core/indexing.py", line 1183, in getitem return self.obj._get_value(key, takeable=self._takeable) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/pandas/core/frame.py", line 4209, in _get_value row = self.index.get_loc(index) File "/mnt/data/covid19_summer/mambaforge/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc raise KeyError(key) from err KeyError: 29791



The length of reference genome of delta is 29789. I searched through the issues and could not solve this problem, just wondering what could be the causes of this error and possible solutions.

2. I'm wondering that is the result produced based on the reference genome I specified? Because I specified the refname as "original", but there exists "Delta" in the result, and a lot of lineages. Can someone plz explain how the result is produced in detail, and how does it do with the reference genome I specified?

3. There are some lineages that I cant find in the curated lineage or in the barcode, I don't know how should I classify these lineages.

Thank you!
joshuailevy commented 3 months ago

Hey @Petrichor-sudo!

  1. All SARS-CoV-2 alignments need to be done to the Hu-1 reference genome (your "original"), as described in the README, in order for the SNPs to map to the correct positions. Other lineages/variants that have indels relative to Hu-1 will lead to incorrect mappings, and will likely cause errors as you experienced.
  2. freyja variants is leveraging iVar under the hood- which is basically counting the bases of each type at each site using mpileup.
  3. You'll need to run freyja update to access these- as soon as they are incorporated into the UShER tree they should be available in both.

Josh

Petrichor-sudo commented 3 months ago

So basically the result is not really produced based on the reference.fsa file I provide, it's just needs the reference.fsa I used for alignment to initiate the tool? Then demix command is using the variant file and depth file produced from variants command(and actually obtained by iVar and samtools?) to generate the result.

Is my understanding correct?

Thanks

Petrichor-sudo commented 3 months ago

I have 6 reference genomes in my reference file, and I used all of them, without specifying any specific one when making the alignments. I'm wondering that should I only use one reference genome to do the alignment instead?

joshuailevy commented 3 weeks ago

Sorry this got missed @Petrichor-sudo. You can use a multi-reference alignment, but you'll need to specify the Hu-1 reference name in the freyja variants step, since that's not the default behavior. Closing since this hasn't been active for a while, feel free to reopen if needed.