Closed estolle closed 11 months ago
reformatting the fake assembly file didnt fix the issue: scaffold_1 assembled-molecule 1 Chromosome XX000001.1 = NC_000001.1 CfrieseiERGA 37766881 chr1 scaffold_2 assembled-molecule 2 Chromosome XX000002.1 = NC_000002.1 CfrieseiERGA 36484387 chr2 scaffold_3 assembled-molecule 3 Chromosome XX000003.1 = NC_000003.1 CfrieseiERGA 30962093 chr3 scaffold_4 assembled-molecule 4 Chromosome XX000004.1 = NC_000004.1 CfrieseiERGA 29900042 chr4
its still failing due to what it seems unexpected chromosome names in the fasta/bam/bed files: "scaffold_1"
Any suggestion how to make splam accept this? Otherwise we cannot use splam for any new genome (and there are alot coming)
Thanks
Hi @estolle,
Thanks for raising this issue. I have just released a new version. You can check it out here: https://github.com/Kuanhao-Chao/splam/tree/v1.0.3. Now, there's no need to provide an assembly_report file. Splam directly reads the length of each chromosome from the FASTA file.
Feel free to let us know if you encounter any issues running Splam v1.0.3.
Kuan-Hao
Hi there
I have a similar problem than the person in the other issue: non.human species. Its not a NCBI/REFSEQ genome, but a newly assembled genome, hence we do not have an assembly report. It would be extremely convenient for all non-human/model species users if splam could utilize something more generic, say a .fai
I already tried to create a fake assembly report but could you specify which parts have to be present and how they are expected to be formatted?
this is the error
[Info] Chromosomes in the annotation file is in 'NCBI RefSeq' style Traceback (most recent call last): File "/home/ek/virtualenvs/splam/bin/splam", line 8, in
sys.exit(main())
File "/home/ek/virtualenvs/splam/lib/python3.8/site-packages/splam/main.py", line 203, in main
donor_bed, acceptor_bed = parse.create_donor_acceptor_bed(junction_bed, outdir, assembly_report)
File "/home/ek/virtualenvs/splam/lib/python3.8/site-packages/splam/parse.py", line 83, in create_donor_acceptor_bed
if donor_e >= chrs[chr] or acceptor_e >= chrs[chr]:
KeyError: 'scaffold_1'
my fake assembly file (
scaffold_1 unplaced-scaffold na na xxxxxx.1 = yyyyyy.1 zzzzz 37766881 chr1 scaffold_2 unplaced-scaffold na na xxxxxx.1 = yyyyyy.1 zzzzz 36484387 chr2 scaffold_3 unplaced-scaffold na na xxxxxx.1 = yyyyyy.1 zzzzz 30962093 chr3 scaffold_4 unplaced-scaffold na na xxxxxx.1 = yyyyyy.1 zzzzz 29900042 chr4
junction.bed file:
scaffold_1 4213 4613 JUNC00000001 16 + scaffold_1 4683 5687 JUNC00000002 5 + scaffold_1 5106 5432 JUNC00000003 13 - scaffold_1 6460 8830 JUNC00000004 12 +