ekawaler / pyQUILTS

Rebuilding QUILTS in Python.
9 stars 9 forks source link

genome ref #16

Closed summerghw closed 4 years ago

summerghw commented 4 years ago

hi , i use ucsc.hg19 to be the reference, but i got such errors : Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrGL000191.1.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrGL000209.1.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG1032_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG104_HG975_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG1063_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG1079_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG1082_HG167_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG115_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG1208_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG1211_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG122_PATCH.fa.cmp1 Error opening: /lustre/user/lixue/ref/protemic/genome_ref//chrHG1257_PATCH.fa.cmp1 I tried to use Ensembl referencd , but they have the diferent name such like image, how can i find the right ref for hg19. thak you very much

ekawaler commented 4 years ago

If you get an error opening a file it means that file doesn't exist at that location. It looks like the reference you're using has a lot of non-canonical chromosomes. You can either leave those genes out (recommended) or you can find a reference for those chromosomes.

summerghw commented 4 years ago

@ekawaler thank you very much for reply. I have got some new issues, is that normal that the variant_proteome.fasta have many blank entry or some id errors, or before i use this database to search, i need to do some preprocess ,like trypsin digested, replace Isoleucine and leucine, delete peptides in refseq database . image image

It seems like that ensembl genome is not suit for my data. I tried refseq genome as ref, it works well. there is another error massage /lustre/user/lixue/tools/quilts/pyQUILTS-master/quilts.py:413: UserWarning: Failed to parse #CHROM POS ID REF ALT QUAL warnings.warn("Failed to parse %s" % line)

Is this a fata error or i can ignore it.