CNV Call output empty, lack of RD histogram

shesanIsland commented 1 year ago

CNVpytor

attempt to calculate cnv in bam files

Done:

Created .pytor file with gc_file, mask_file, specified chromosomes and imported read depth signals using:

cnvpytor -root /filepath/filename.pytor \
-rd /filepath/filename.bam \
-partition 100 \
-call 100 \
-chrom backbone, transgene, AgamP4_2L, AgamP4_2R, AgamP4_3L, AgamP4_3R, AgamP4_X \
-T /path_Reference/genome_edit.fa

-ls provides the following:

`Filename '/filepath/filename.pytor'
---------------------------------------------------------------------------------------
File created: 2023-02-28 20:03 using CNVpytor ver 1.2.1

Chromosomes with RD signal: backbone, transgene, AgamP4_2L, AgamP4_2R, AgamP4_3L, AgamP4_3R, AgamP4_X

Chromosomes with SNP signal: 

Using reference genome: agamp4 [ GC: yes, mask: yes ]

Chromosomes with RD histograms [bin sizes]:  []

Chromosomes with SNP histograms [bin sizes]:  []

Chromosome lengths: {'backbone': '12339', 'transgene': '8299', 'AgamP4_2L': '49364325', 'AgamP4_2R': '61545105', 'AgamP4_3L': '41963435', 'AgamP4_3R': '53200684', 'AgamP4_X': '24393108'}`

Issues:

calculating histogram read depth in order to create manhattan and genomic/chromosome plots.

Have tried: cnvpytor -root /filepath/filename.pytor -stat 100 cnvpytor -root /filepath/filename.pytor -his 100 cnvpytor -root /filepath/filename.pytor -partition 100 cnvpytor -root /filepath/filename.pytor -call 100 cnvpytor -root /filepath/filename.pytor -view 100

-v debug doesn't show or highlight any errors...
print calls in view doesn't show any calls..
manhattan plot only shows a plot with missing data and y-axis with the same range limits
also tried in python3, still empty... is there something wrong with my bam file or the chromsome names (despite being specified and extracted).

Many thanks,

arpanda commented 1 year ago

I assume you configured your genome properly and reference genome name can be seen by your ls command.

Using reference genome: agamp4 [ GC: yes, mask: yes ]

But your rd histogram is empty

Chromosomes with RD histograms [bin sizes]:  []

Could you please try a different bin size like 1000 instead of 100 and then using ls command check this Chromosomes with RD histograms [bin sizes]: field. You will see something like this if you processed it using 1000 bin size

Chromosomes with RD histograms [bin sizes]:  [1000]

Thank You Arijit

shesanIsland commented 1 year ago

Thanks for the reply. I receive the same output as my initial issue even after changing bin size to 1000.

arpanda commented 1 year ago

Could you please check if your reference genome configuration is correct or not. There are two ways to add a new reference genome.

Using -conf parameter
Copying your reference configuration file to ~/.cnvpytor/reference_genomes_conf.py. Ref Link.

I assume you followed the second way, as there is no -conf parameter in the shared commands. Could you please confirm it.

Also, for the rd command, please remove the -partition 100 -call 100parameters. It should be something like this

cnvpytor -root /filepath/filename.pytor \
-rd /filepath/filename.bam \
-chrom backbone, transgene, AgamP4_2L, AgamP4_2R, AgamP4_3L, AgamP4_3R, AgamP4_X

then run the rest of the command.

cnvpytor -root /filepath/filename.pytor -stat 100
cnvpytor -root /filepath/filename.pytor -his 100
cnvpytor -root /filepath/filename.pytor -partition 100
cnvpytor -root /filepath/filename.pytor -call 100
cnvpytor -root /filepath/filename.pytor -view 100

If the reference genome is provided using-conf parameter. For each command, you need to add -conf parameter..

shesanIsland commented 1 year ago

Yes, I've also created the .pytor file without the partition and call parameters and had the same result.

A ref_genome_conf.py file was created with the second method:

import_reference_genomes={
  "agamp4":{
    "name": "AgamP4",
    "species": "Anopheles Gambiae",
    "chromosomes": OrderedDict(
      [("backbone", (12339, "A")), ("transgene", (8299, "A")),
 ("AgamP4_2L", (49364325, "A")), ("AgamP4_2R", (61545105, "A")),
       ("AgamP4_3L", (41963435, "A")), ("AgamP4_3R", (53200684, "A")), ("AgamP4_UNKN", (42389979, "A")), ("AgamP4_X", (24393108, "S")), ("AgamP4_Y_unplaced", (237045, "S"))]),
    "gc_file":"/filepath/AgamP4_gc_file.pytor",
          "mask_file": "/filepath/AgamP4_mask_file.pytor"
  }
}

Masked file created with: cnvpytor -root /filepath/AgamP4_mask_file.pytor -mask /filepath/genome.fa -make_mask_file

shesanIsland commented 1 year ago

The bam file headers were previously edited alongside the chromosome names in the reference genome file to remove '()' characters. I have made sure they match so I assume this is not causing an issue.

arpanda commented 1 year ago

Do you have P bases in /filepath/genome.fa file that you used to created the AgamP4_mask_file.pytor if not, please try without the mask file. Otherwise, will you able to share the AgamP4_mask_file.pytor, AgamP4_gc_file.pytor, AgamP4_mask_file.pytor via email ? I can check the files.

shesanIsland commented 1 year ago

Hello, I simply decided to start again by removing the conda environment and reinstalling. I have successfully used the stats, partition and call parameters with -conf, however, I have come up with another error.

-ls gives the following:

Filename '/filepath/filename.pytor'
----------------------------------------------------------------------------------------
File created: 2023-03-07 16:38 using CNVpytor ver 1.2.1

Chromosomes with RD signal: backbone, transgene, AgamP4_2L, AgamP4_2R, AgamP4_3L, AgamP4_3R, AgamP4_X

Chromosomes with SNP signal: 

Using reference genome: agamp4 [ 
Chromosomes with RD histograms [bin sizes]: transgene, AgamP4_2L, AgamP4_2R, AgamP4_3L, AgamP4_3R, AgamP4_X, backbone [100]

Chromosomes with SNP histograms [bin sizes]:  []

Chromosome lengths: {'backbone': '12339', 'transgene': '8299', 'AgamP4_2L': '49364325', 'AgamP4_2R': '61545105', 'AgamP4_3L': '41963435', 'AgamP4_3R': '53200684', 'AgamP4_X': '24393108'}

However the cnv/ read depth is missing from the manhattan plot when I use:

cnvpytor -conf /filepath/ref_genome_conf.py -root /filepath/filename.pytor -view 100
set rd_use_mask
print calls
# set panels rd
set chrom backbone transgene AgamP4_2L AgamP4_2R AgamP4_3L AgamP4_3R AgamP4_X
manhattan
save /filepath/plotname.png
quit

I simply get an empty plot. Depending on other parameters that have been used, sometimes a message pops up explaining :

opt/miniconda3/envs/cnvpytor_env/lib/python3.7/site-packages/cnvpytor/viewer.py:1997: UserWarning: Attempting to set identical left == right == -0.0 results in singular transformations; automatically expanding. ax.set_xlim([-l 0.0, (l - 1) 1.0])

I have also tried :

cnvpytor -conf /filepath/ref_genome_conf.py \
-root /filepath/filename.pytor \
-plot manhattan 100 \
-chrom "backbone", "transgene", "AgamP4_2L", "AgamP4_2R", "AgamP4_3L", "AgamP4_3R", "AgamP4_X" \
-o /filepath/plotname.png

This only gave a plot with overlapping chromosome names...

Lastly, is it possible to merge regions ie find the average copy number of each chromosome? I am trying to compare the transgene with the reference. Thanks

arpanda commented 1 year ago

You are still having empty Manhattan plot. Right? The reference genome section probably has an issue. See this section in ls result.

Using reference genome: agamp4 [

If you have used mask file, it should be like agamp4 [ GC: yes, mask: yes ] if, not it will be like agamp4 [ GC: yes, mask: no ]

For the warning,

opt/miniconda3/envs/cnvpytor_env/lib/python3.7/site-packages/cnvpytor/viewer.py:1997: UserWarning: Attempting to set identical left == right == -0.0 results in singular transformations; automatically expanding. ax.set_xlim([-l 0.0, (l - 1) 1.0])

This warning is due to the visualization library and not creating any issue for now, Please ignore it for now. We will try to address it in the coming releases.

Lastly, is it possible to merge regions ie find the average copy number of each chromosome? I am trying to compare the transgene with the reference.

The question is clear to me. As per my understanding, you are trying to find out if the contig called transgene has any copy number variation or not. Am I right? If yes, you can find that by cnv calls or by looking at the Manhattan plot.

shesanIsland commented 1 year ago

Odd how that happens despite seeing the following information when importing ref_genome_conf.py, rd, .bam, etc.:

cnvpytor.bam - INFO - Detected reference genome: agamp4

Yes to your question. My workaround was simply adding the ref_genome_conf.py code to the genome.py script. I now have Using reference genome: agamp4 [ GC: yes, mask: yes ]

As per my understanding, you are trying to find out if the contig called transgene has any copy number variation or not. Am I right? If yes, you can find that by cnv calls or by looking at the Manhattan plot.

Yes to your question, however, I am still getting an empty manhattan plot when using:

cnvpytor -conf /filepath/ref_genome_conf.py -root /filepath/filename.pytor -view 100
set style bmh
set rd_use_mask
print calls
set panels rd
set chrom backbone transgene AgamP4_2L AgamP4_2R AgamP4_3L AgamP4_3R AgamP4_X
manhattan

Result:

arpanda commented 1 year ago

The mask file has no P bases and the set rd_use_mask is used.

abyzovlab / CNVpytor

CNV Call output empty, lack of RD histogram #168