genomeFile or GFF3 issue?

atulkakrana / sPARTA

miRNA-target prediction and PARE-seq based validation tool - uses MapReduce model [Published]

Other

7 stars 7 forks source link

genomeFile or GFF3 issue? #1

Closed pedrocrisp closed 8 years ago

pedrocrisp commented 8 years ago

Hi, I get the following error when running sPARTA when it gets to the end of creating the fasta feature file:

Traceback (most recent call last):
  File "sPARTA.py", line 2034, in <module>
    main()
  File "sPARTA.py", line 1785, in main
    unambiguousBaseCounter(fastaOut, args.minTagLen)
  File "sPARTA.py", line 1468, in unambiguousBaseCounter
    baseCountsOffTagLen += (len(currentLine) - 2 * minTagLen) - currentLine[
TypeError: unsupported operand type(s) for -: 'int' and 'str'

Is this a reference issue? Any ideas? I am using the TAIR10 gff3 and .fa

This is the code I am running

python3 sPARTA.py \
-genomeFile TAIR10.fa \
-gffFile TAIR10_GFF3_genes.gff \
-genomeFeature 0 \
-miRNAFile RRGS_miRBase_miRNA_master_20160419.fa \
-libs reads_noadapt_cutadapt_20nt_tags/Sample_317_* \
-tarPred -tarScore --tag2FASTA --map2DD --validate \
-minTagLen 20

Thanks for your help.

Cheers, Peter

atulkakrana commented 8 years ago

Hi Pedro,

Sorry for late response. I was travelling and somehow missed this e-mail. Let me check this tomorrow.

Atul

rkweku commented 8 years ago

Hi Peter,

Apologies for the late reply; it is my fault. I hope that you are still interested in running sPARTA.

You stumbled upon a bug in the code that I had previously missed. Typically, I run sPARTA without specifying the minTagLen and letting the code decide that the default length will be 20. Specifying it in the command line makes this variable the string '20' rather than an integer 20. I fixed this bug so you should be able to run it using the same command as before and it should run just fine.

If you hadn't used them before, make sure you access the genome and GFF files from https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas and https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_gff3/TAIR10_GFF3_genes.gff respectively.

If you have any problems, please don't hesitate to email us. We will try to get back to you much quicker than this time.

Thank you, Reza Hammond

pedrocrisp commented 8 years ago

Hi Reza, Yep we found that bug too. That fixed that issue. Also had some issues with when using files that were not in the root dir of folder we were running the script from, so we used hard links. That worked. Also had some issues with calling pyfasta split using subprocess.call, so Kevin from our lab (@kdmurray91) wrote a quick little python function to do this within sPARTA.py instead (Happy to make a pull request to share this with you). Also got caught out by samples names with _, so renamed them.

Then it worked! About to take a look at the results.

Thanks for the tool.

cheers peter

atulkakrana commented 8 years ago

Hi Peter (and Reza),

Thanks for reporting the bug. I am glad that it's fixed. sPARTA not picking files from root directory is strange, but we have a new version coming that should take care of it. Also, we are removing "pyfasta" requirement from next release, we had that in mind since first release. Would you like to share your substitute for "pyfasta"? If it's faster than what we have here than it would be nice contribution to sPARTA.

Best

Atul

kdm9 commented 8 years ago

To parse fasta sequences I used screed.

I'll get the code I wrote to split the fasta from Pete tomorrow.