UMCUGenetics / MutationalPatterns

R package for extracting and visualizing mutational patterns in base substitution catalogues
MIT License
104 stars 45 forks source link

Error Reading in VCF files for rheMac10 #65

Closed Rashesh7 closed 2 years ago

Rashesh7 commented 3 years ago

Hello,

I have some Macaque data and was trying to read in the VCF files using the following command:

read_vcfs_as_granges("~/rs30-117/Macaque_Signatures-Conrad/macaca_mulatta/macaca_mulatta.vcf.gz", "Shared", BSgenome.Mmulatta.UCSC.rheMac10)

But I get the following error: The style specified by 'UCSC' does not have a compatible entry for the species Macaca mulatta Error: The vcf could not be filtered for the specific seqlevels group. You can run this function with group = 'all', to prevent this error. (The message of the internal error causing this problem is shown above.)

The input file is from Ensembl, so the file format is fine. I also checked BSgenome.Mmulatta.UCSC.rheMac10 and it is installed correctly and shows me the correct chromosomes.

Can you please let me know whether I am running this incorrectly? Or how I can resolve this error?

Many Thanks, Rashesh

FreekManders commented 3 years ago

Hi Rashesh,

The function tries to change the chromosome names of the vcf, so that they match the chromosome names of the BSgenome object. However, this isn't working for your species. You can run the code like this to prevent the error:

gr = read_vcfs_as_granges("~/rs30-117/Macaque_Signatures-Conrad/macaca_mulatta/macaca_mulatta.vcf.gz", 
"Shared", 
BSgenome.Mmulatta.UCSC.rheMac10,
group = "none",
change_seqnames = FALSE)

You then have to manually ensure the chromosome names of the vcf and BSgenome object match. You can do this for example by changing the chromosome names of the vcf with seqnames(gr) = YOUR_SEQNAMES

Rashesh7 commented 3 years ago

Hi @FreekManders,

Many thanks! Worked perfectly!.