cmks / DAS_Tool

DAS Tool
Other
140 stars 17 forks source link

[Feature Request] Accept empty scaffolds_to_bins files #67

Closed jolespin closed 3 years ago

jolespin commented 3 years ago

What are your thoughts on having DASTool accept empty scaffolds to bins files? The reasoning for this is that it is typical to include DASTool in pipelines and sometimes binners don't find any bins while others do. The pipeline breaks because of this. Would DASTool be able to take this but then issue a warnings saying that the file is empty?

cmks commented 3 years ago

It is possible to add this feature. I'm about to make some general changes to the code and can also include better handling of empty input files.

zoey-rw commented 3 years ago

Hi, I am wondering if there is a time estimate on this feature request (handling empty input files), or if anyone have recommendations for temporary workarounds for use within pipelines.

Thank you for all your awesome work on DAS Tool!!

jolespin commented 3 years ago

I have a for-loop somewhere that basically says:

s2b="" for FP in "list" "of' "scaffold2bins"; do echo $FP; if file is not empty; s2b += FP + "," else: dont do that

This is extremely pseudo-code but thats what I ended up doing. Not sure where the actual bash code is so I can't be much more help. Some stackoverflow should help with the details tho.

cmks commented 3 years ago

Hi @zoey-rw, Like @jolespin proposed, a quick workaround is a script that checks the size of your input files. The following bash-code example uses two scaffold2bin files of the sample_data of this repo and includes one empty file (sample.human.gut_emptyBin_scaffolds2bin.tsv):

# create empty file:
touch sample_data/sample.human.gut_emptyBin_scaffolds2bin.tsv
# define input files including empty file:
scaffoldstobins='sample_data/sample.human.gut_concoct_scaffolds2bin.tsv,sample_data/sample.human.gut_emptyBin_scaffolds2bin.tsv,sample_data/sample.human.gut_metabat_scaffolds2bin.tsv'

# check if any scaffold2bin files are emtpy:
s2b_tmp=''
for i in $(echo ${scaffoldstobins} | tr "," " ")
do
scaf2bin_wcl=$(cat ${i} | wc -l)
if [ "${scaf2bin_wcl}" -eq "0" ]
then
  echo "Warning: scaffolds2bin file is empty: $i "
else
  s2b_tmp=${s2b_tmp},${i}
fi
done
# remove initial ',':
s2b_tmp=${s2b_tmp#","}

# check if all scaffold2bin files are emtpy:
if [ "${#s2b_tmp}" -eq "0" ]
then
  echo "Warning: All input files are empty."
else
  scaffoldstobins=${s2b_tmp}
fi

echo $scaffoldstobins

Now, you would run DAS Tool with -i $scaffoldstobins as input.

cmks commented 3 years ago

Alternatively, you can check out the new empty_input_file_fix branch, which is able to handle empty scaffold2bin files.

cmks commented 3 years ago

This feature has been merged into the master branch. Closing this ticket.

jolespin commented 3 years ago

Is this in v1.1.3?

cmks commented 3 years ago

Yes.