TcbfGroup / Tcbf

A pipeline for identifying conserved topologically associating domain boundaries among multiple species.
MIT License
13 stars 3 forks source link

ERROR: Did not find fasta records in "input files". #2

Open jdamas13 opened 2 months ago

jdamas13 commented 2 months ago

Hi!

I'm trying to use your tool for comparing TAD boundaries between two species.

I keep running into the error "Did not find fasta records in input files" at the mash sketch step. When I look into the Step1 folder I can see my genome files were either not copied or only partially copied.

Any idea how to fix this problem?

Thank you!

TcbfGroup commented 2 months ago

I think this issue may be caused by an incorrect format of the genome fasta file. Could you please take a screenshot displaying the content of the fasta file? You can use the following command to preview the fasta file:

less -S genome.fa

This command will display the content of the fasta file continuously, allowing us to clearly see if the format is correct. If the format appears to be fine, we will need to investigate further. The gzipped file is not supported.

jdamas13 commented 2 months ago

I don't believe it is, as I tried to run mash sketch on the original file, and it worked. For some reason, the fasta files are not copied to the working directory. I'm not sure why.

Either way, I did look at it with less -S, and it looks good to me (see screenshot below). image

jdamas13 commented 2 months ago

I believe I have solved this issue. The problem seemed to be that add_prefix() is a long-running function, and Python did not wait for it to finish before starting mash_genome(). So, I replaced add_prefix with two bash commands (below).

#add_prefix(genome, prefix, genome_file, tad) #original line
os.system(f"cp {genome} {genome_file}")
os.system(f"sed -i 's/>/>{prefix}_/g' {genome_file}")