Kuanhao-Chao / splam

✂️ Deep learning-based splice site predictor that improves spliced alignments
http://ccb.jhu.edu/splam/
31 stars 1 forks source link

Splam for non human species #1

Closed arslan9732 closed 3 months ago

arslan9732 commented 1 year ago

Hi, I am trying to run splam on plant data. I have stringtie output. When I try to run splam score with the -A parameter it gives me this error:

Traceback (most recent call last):
  File "/home/drone/splam/splam.venv/bin/splam", line 11, in <module>
    load_entry_point('splam==1.0.2', 'console_scripts', 'splam')()
  File "/home/drone/splam/splam.venv/lib/python3.8/site-packages/splam-1.0.2-py3.8-linux-x86_64.egg/splam/main.py", line 203, in main
    donor_bed, acceptor_bed = parse.create_donor_acceptor_bed(junction_bed, outdir, assembly_report)
  File "/home/drone/splam/splam.venv/lib/python3.8/site-packages/splam-1.0.2-py3.8-linux-x86_64.egg/splam/parse.py", line 29, in create_donor_acceptor_bed
    chrs = chr_size.get_chrom_size(assembly_report, 'chr')
  File "/home/drone/splam/splam.venv/lib/python3.8/site-packages/splam-1.0.2-py3.8-linux-x86_64.egg/splam/chr_size.py", line 20, in get_chrom_size
    refseq_name = columns[9]

my genome length file looks like this:

chr01   252377257
chr02   228357935
chr03   227991009
chr04   217148095
chr05   212042709
chr06   211849564
chr07   226572831
chr08   207533968
chr09   196681080

I also try to make a length file as the example file and it looks like this:

# Sequence-Name Sequence-Role   Assigned-Molecule       Assigned-Molecule-Location/Type GenBank-Accn    Relationship    RefSeq-Accn     Assembly-Unit   Sequence-Length UCSC-style-name
1       assembled-molecule      1       Chromosome              =               chr01   252377257       chr01
2       assembled-molecule      2       Chromosome              =               chr02   228357935       chr02
3       assembled-molecule      3       Chromosome              =               chr03   227991009       chr03
4       assembled-molecule      4       Chromosome              =               chr04   217148095       chr04
5       assembled-molecule      5       Chromosome              =               chr05   212042709       chr05
6       assembled-molecule      6       Chromosome              =               chr06   211849564       chr06
7       assembled-molecule      7       Chromosome              =               chr07   226572831       chr07
8       assembled-molecule      8       Chromosome              =               chr08   207533968       chr08
9       assembled-molecule      9       Chromosome              =               chr09   196681080       chr09

Then it gave me this error:

[Info] Chromosomes in the annotation file is in 'chr*' style
Traceback (most recent call last):
  File "/home/drone/splam/splam.venv/bin/splam", line 11, in <module>
    load_entry_point('splam==1.0.2', 'console_scripts', 'splam')()
  File "/home/drone/splam/splam.venv/lib/python3.8/site-packages/splam-1.0.2-py3.8-linux-x86_64.egg/splam/main.py", line 203, in main
    donor_bed, acceptor_bed = parse.create_donor_acceptor_bed(junction_bed, outdir, assembly_report)
  File "/home/drone/splam/splam.venv/lib/python3.8/site-packages/splam-1.0.2-py3.8-linux-x86_64.egg/splam/parse.py", line 67, in create_donor_acceptor_bed
    if splice_junc_len < config.QUARTER_SEQ_LEN:
UnboundLocalError: local variable 'splice_junc_len' referenced before assignment
am12 commented 1 year ago

Hi Arslan,

Thanks for bringing this up. Splam currently reads from the exact formatting of the ASSEMBLY_REPORT.txt file, which we download from the NCBI FTP site (example from GRCh38 here). Notice the UCSC-style Names and Sequence Lengths are in the 10th and 9th columns, respectively. It seems that the length file you created from the example had them in the 8th and 7th columns.

In a future release, we will obtain this information from the fasta file directly, but for now I would recommend to copy the tab-delimited assembly report format exactly, and replace the pertinent columns (9 and 10) with the corresponding information from your species. Hope this helps!

-Alan