UCSC-LoweLab / tRAX

tRNA Analysis of eXpression
GNU General Public License v3.0
8 stars 5 forks source link

maketrnadb.py error message #1

Closed bm153 closed 3 years ago

bm153 commented 5 years ago

Hi! I'm trying to build a mature tRNA database using the maketrnadb.py script on the command line. I used the following command:

$ python maketrnadb.py --databasename=mature_tRNA_db_GRCh38 --genomefile=/mnt/data/mature_tRNA_db_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa --trnascanfile=/mnt/data/mature_tRNA_db_GRCh38/hg38-tRNAs_name_map.txt --gtrnafafile=/mnt/data/mature_tRNA_db_GRCh38/hg38-tRNAs.fa

I then get the following:

Traceback (most recent call last):
  File "maketrnadb.py", line 73, in <module>
    getmaturetrnas.main(trnascan=[scanfile], genome=genomefile,gtrnafa=gtrnafafile,bedfile=dbname+"-maturetRNAs.bed",maturetrnatable=dbname+"-trnatable.txt",trnaalignment=dbname+"-trnaalign.stk",locibed=dbname+"-trnaloci.bed",maturetrnafa=dbname+"-maturetRNAs.fa")
  File "/mnt/data/BMOHAMED/TRAX/tRAX-master/getmaturetrnas.py", line 53, in main
    trnadbtrnas.extend(readtRNAdb(currfile, args["genome"], gtrnatrans))
  File "/mnt/data/BMOHAMED/TRAX/tRAX-master/parsetrnas.py", line 259, in readtRNAdb
    curramino = fields[4]
IndexError: list index out of range

after a long list of tRNA loci position names (e.g. tRNA-Ala-AGC-1-1:chr6.trna116) How do I resolve this?

andrewdholmes commented 5 years ago

OK that looks like there's a line of the tRNAscan file that it's choking on. How did you generate the hg38-tRNAs_name_map.txt file?

On Wed, Jul 25, 2018 at 8:08 AM, bm153 notifications@github.com wrote:

Hi! I'm trying to build a mature tRNA database using the maketrnadb.py script on the command line. I used the following command:

$ python maketrnadb.py --databasename=mature_tRNA_db_GRCh38 --genomefile=/mnt/data/mature_tRNA_db_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa --trnascanfile=/mnt/data/mature_tRNA_db_GRCh38/hg38-tRNAs_name_map.txt --gtrnafafile=/mnt/data/mature_tRNA_db_GRCh38/hg38-tRNAs.fa

I then get the following:

Traceback (most recent call last): File "maketrnadb.py", line 73, in getmaturetrnas.main(trnascan=[scanfile], genome=genomefile,gtrnafa=gtrnafafile,bedfile=dbname+"-maturetRNAs.bed",maturetrnatable=dbname+"-trnatable.txt",trnaalignment=dbname+"-trnaalign.stk",locibed=dbname+"-trnaloci.bed",maturetrnafa=dbname+"-maturetRNAs.fa") File "/mnt/data/BMOHAMED/TRAX/tRAX-master/getmaturetrnas.py", line 53, in main trnadbtrnas.extend(readtRNAdb(currfile, args["genome"], gtrnatrans)) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/parsetrnas.py", line 259, in readtRNAdb curramino = fields[4] IndexError: list index out of range

after a long list of tRNA loci position names (e.g. tRNA-Ala-AGC-1-1:chr6.trna116) How do I resolve this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/UCSC-LoweLab/tRAX/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ5GLuUvBBDj7xP96j7pDf98qwSxvyV1ks5uKInogaJpZM4VgQ-Z .

bm153 commented 5 years ago

I downloaded the hg38-tRNAs.tar.gz file from the gtRNAdb which had the hg38-tRNAs_name_map.txt file when decompressed.

andrewdholmes commented 5 years ago

Alright the file you want for that argument in the hg38 gtRNAdb dump is "hg38-tRNAs-confidence-set.out" although "hg38-tRNAs-detailed.out" should also work, that one contains the more dubious tRNAs.

On Thu, Jul 26, 2018 at 1:36 AM, bm153 notifications@github.com wrote:

I downloaded the hg38-tRNAs.tar.gz file from the gtRNAdb which had the hg38-tRNAs_name_map.txt file when decompressed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/UCSC-LoweLab/tRAX/issues/1#issuecomment-408021772, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ5GLsMeSFlMGPdKQKG-EHE7-qrUpvbSks5uKX-DgaJpZM4VgQ-Z .

bm153 commented 5 years ago

so I used both the hg38-tRNAs-confidence-set.out and hg38-tRNAs-detailed-set.out and i got the following output: Traceback (most recent call last): File "maketrnadb.py", line 73, in <module> getmaturetrnas.main(trnascan=[scanfile], genome=genomefile,gtrnafa=gtrnafafile,bedfile=dbname+"-maturetRNAs.bed",maturetrnatable=dbname+"-trnatable.txt",trnaalignment=dbname+"-trnaalign.stk",locibed=dbname+"-trnaloci.bed",maturetrnafa=dbname+"-maturetRNAs.fa") File "/mnt/data/BMOHAMED/TRAX/tRAX-master/getmaturetrnas.py", line 53, in main trnadbtrnas.extend(readtRNAdb(currfile, args["genome"], gtrnatrans)) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/parsetrnas.py", line 306, in readtRNAdb trnaseqs = getseqdict(trnalist, faifiles = {orgname:genomefile+".fai"}) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 741, in getseqdict currseqs = getseqs(fastafiles[currorg], dbdict[currorg], faindex = faifiles[currorg]) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 763, in getseqs return faifile.getseqs(rangedict) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 849, in getseqs genomefile.seek(self.getseek(currchrom,currregion.start)) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 832, in getseek return self.chromoffset[currchrom] + loc + int(loc/(self.seqlinesize[currchrom]))*(self.seqlinebytes[currchrom] - self.seqlinesize[currchrom]) KeyError: 'chr6'

Also, I converted these files into .txt files and I got the following error: Traceback (most recent call last): File "maketrnadb.py", line 73, in <module> getmaturetrnas.main(trnascan=[scanfile], genome=genomefile,gtrnafa=gtrnafafile,bedfile=dbname+"-maturetRNAs.bed",maturetrnatable=dbname+"-trnatable.txt",trnaalignment=dbname+"-trnaalign.stk",locibed=dbname+"-trnaloci.bed",maturetrnafa=dbname+"-maturetRNAs.fa") File "/mnt/data/BMOHAMED/TRAX/tRAX-master/getmaturetrnas.py", line 53, in main trnadbtrnas.extend(readtRNAdb(currfile, args["genome"], gtrnatrans)) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/parsetrnas.py", line 259, in readtRNAdb curramino = fields[4] IndexError: list index out of range

andrewdholmes commented 5 years ago

OK, so I think the problem there is that you are using the Ensembl genome file with the Ensembl chromosome names, where gtRNAdb uses the UCSC genome.

If you want the UCSC genome, you can grab it using "wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz" and then use "tar xvf hg38.chromFa.tar.gz -O > hg38genome.fa" to put it all in one fasta file that then works as input.

On Fri, Jul 27, 2018 at 3:36 AM, bm153 notifications@github.com wrote:

so I used both the hg38-tRNAs-confidence-set.out and hg38-tRNAs-detailed-set.out and i got the following output: Traceback (most recent call last): File "maketrnadb.py", line 73, in

getmaturetrnas.main(trnascan=[scanfile], genome=genomefile,gtrnafa=gtrnafafile,bedfile=dbname+"-maturetRNAs.bed", maturetrnatable=dbname+"-trnatable.txt",trnaalignment= dbname+"-trnaalign.stk",locibed=dbname+"-trnaloci.bed" ,maturetrnafa=dbname+"-maturetRNAs.fa") File "/mnt/data/BMOHAMED/TRAX/tRAX-master/getmaturetrnas.py", line 53, in main trnadbtrnas.extend(readtRNAdb(currfile, args["genome"], gtrnatrans)) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/parsetrnas.py", line 306, in readtRNAdb trnaseqs = getseqdict(trnalist, faifiles = {orgname:genomefile+".fai"}) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 741, in getseqdict currseqs = getseqs(fastafiles[currorg], dbdict[currorg], faindex = faifiles[currorg]) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 763, in getseqs return faifile.getseqs(rangedict) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 849, in getseqs genomefile.seek(self.getseek(currchrom,currregion.start)) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/trnasequtils.py", line 832, in getseek return self.chromoffset[currchrom] + loc + int(loc/(self.seqlinesize[currchrom]))*(self.seqlinebytes[currchrom] - self.seqlinesize[currchrom]) KeyError: 'chr6' Also, I converted these files into .txt files and I got the following error: Traceback (most recent call last): File "maketrnadb.py", line 73, in getmaturetrnas.main(trnascan=[scanfile], genome=genomefile,gtrnafa=gtrnafafile,bedfile=dbname+"-maturetRNAs.bed", maturetrnatable=dbname+"-trnatable.txt",trnaalignment= dbname+"-trnaalign.stk",locibed=dbname+"-trnaloci.bed" ,maturetrnafa=dbname+"-maturetRNAs.fa") File "/mnt/data/BMOHAMED/TRAX/tRAX-master/getmaturetrnas.py", line 53, in main trnadbtrnas.extend(readtRNAdb(currfile, args["genome"], gtrnatrans)) File "/mnt/data/BMOHAMED/TRAX/tRAX-master/parsetrnas.py", line 259, in readtRNAdb curramino = fields[4] IndexError: list index out of range — You are receiving this because you commented. Reply to this email directly, view it on GitHub , or mute the thread .