LanLab / salty

The Staphylococcus aureus Lineage Typer (SaLTy)
GNU General Public License v3.0
5 stars 0 forks source link

Splitting at "_" to get output file prefix causes filenames to be truncated #4

Closed VishnuRaghuram94 closed 6 months ago

VishnuRaghuram94 commented 1 year ago

Genbank and refseq accessions are formatted such as GCA_900620225.1. The assemblies are named such as GCA_900620225.1_BPH2819_genomic.fna . Running salty on such files leads to the output folder name to be truncated to just GCA , leading to all files formatted similarly being overwritten into the same files with prefix GCA.

I am not sure but this might be the line that is causing it .

Steps to reproduce

Download sample genomes from NCBI

datasets download genome accession "GCA_900474725.1" "GCA_900620225.1"
unzip ncbi_dataset.zip
mkdir salty_test; cp ncbi_dataset/data/GCA_900*/GCA_900*.fna salty_test/

Run salty

salty --version
salty version: 1.0.4
mkdir salty_test_output
salty --input_folder salty_test --output_folder ./salty_test_output

Output:

# Reading inputfile:    salty_test/GCA_900474725.1_40677_A01_genomic.fna
#
# Total time used for DB loading: 0.01 s.
#
# Finding k-mer ankers
# Running KMA.
#
# Total number of query fragment after trimming:        1
#
# Query converted
#
# Query ankered
#
# KMA mapping done
#
# Sort, output and select KMA alignments.
# Total time for sorting and outputting KMA alignment   0.01 s.
#
# Doing local assemblies of found templates, and output results
# Total time used for local assembly: 0.14 s.
#
# Closing files
Passed:          gene:SACOL0451  allele:1
Passed:          gene:SACOL1908  allele:111
Passed:          gene:SACOL2725  allele:1
/home/vraghu2/miniconda3/envs/salty/lib/python3.11/site-packages/salty/salty.py:78: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  filtLineageAllelesDF = filtLineageAllelesDF.append(filtAllele, sort=True)
GCA: writing output.
{'Lineage': 11, 'SACOL1908': 111, 'SACOL0451': 1, 'SACOL2725': 1}
Error: Directory Exists: ./salty_test_output/GCA
mjohnpayne commented 6 months ago

Sorry for the very long delay! This should be fixed now. I will update conda to the latest version shortly

mjohnpayne commented 6 months ago

version 1.0.6 fixes this issue and is now released available from bioconda. Again sorry for the delay and thanks for finding this!