cmks / DAS_Tool

DAS Tool
Other
140 stars 17 forks source link

Column 1 of result for group 3421 is type 'double' but expecting type 'integer' #99

Closed huizhen-yan closed 10 months ago

huizhen-yan commented 1 year ago

Hi, I ran DAS_Tool and got the following error. `time DAS_Tool -i z1.Contig2bin.tsv,p1.Contig2bin.tsv,p2.Contig2bin.tsv -l z1,p1,p2 -c contigs.fa -t 56 --write_bins -o all_bins_dastool DAS Tool 1.1.6

Analyzing assembly Predicting genes Annotating single copy genes using diamond Dereplicating, aggregating, and scoring bins Error in [.data.table(bin_tab_contig, , .(binSize = calc_bins_size(contig_id, : Column 1 of result for group 3421 is type 'double' but expecting type 'integer'. Column types must be consistent for each group. Calls: cherry_pick -> score_bins -> %>% -> setkey -> [ -> [.data.table In addition: Warning message: In calc_N50(contig_id, contig_length) : integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' Execution halted

real 158m57.298s user 5466m32.675s sys 10m37.638s $`

Here are the input files.

head z1.Contig2bin.tsv 
k127_5405675 z1.bin.1000
k127_7207940 z1.bin.1000
k127_7208440 z1.bin.1000
k127_4507492 z1.bin.1000
k127_12615522 z1.bin.1000
k127_4508035 z1.bin.1000
k127_10814506 z1.bin.1000
k127_3606553 z1.bin.1000
k127_10815671 z1.bin.1000
head p1.Contig2bin.tsv 
k127_21624164 p1.bin.1000
k127_11266923 p1.bin.1000
k127_14421405 p1.bin.1000
k127_19376857 p1.bin.1000
k127_13167 p1.bin.1000
k127_9923194 p1.bin.1000
k127_8577692 p1.bin.1000
k127_5878300 p1.bin.1000
k127_20739313 p1.bin.1000
head contigs.fa
>k127_13963658 flag=0 multi=51.4235 len=1029
TGAAGTCGACGTTGAACAGAAAGCCGAGGCCGGCCGCAAAATCGCCGGCATACGCCGCGTAGCCGATCATGACCAGCATCATCGCAAACAGCGAGGGCATCAGCATCTTCACGGCCTTCTCGATGCCATCCTGCAGGCCACGGCCGACGATGGAGAGCGCGATGGCGATGAACACCGTGTGCCAGAGGGTCATCGTCACCGGGTCCGCCAGCAACCCGTCGAACTGCCCCGCCACCTCGAGCGGACCGGCGCCGCTGAAGCCGCCCGCCGCCTTGCCGATGTAGCTCAGCGTCCAGCCGGCGATGACGCTGTAGTAGGTCGCGATCAGGAAGCCGACGATTGTCCCCATCCAGCCGACGATGCGCCAGGCCCTGGAGCGGCCGGCACTCGCGGCGAGCGTCGACATGGCCACCGGCGGGCTGCTCGCGCCACGACGGCCGATGAGTAGTTCCGCGATGAGGATCGGGATGGCGACGAAGACCACGCAGGCGAGGTAGACCAGCACGAAGGCGCCGCCGCCGCTGACGCCGGCAACGAACGGGAACTTCCAGATATTACCGAGGCCGACCGCCGCGCCGACCGCGGCGAGGATGAACGTGAAACCCGAAGACCAGTTCTGTGTGCTGCCTGTGCCTGCCATTAGTTGCTCGCTTGTTGGTGGTTATCCAGTACGCGGCCGGGATTCAGGATATTGCTGGGGTCGAGCGCGGACTTCAACGCGCGCATGAGCCCGATCTCCGCGGCCGTGCGGCTGTGCGGCAGCCACTTGAGTTTTTCCGTGCCGATACCGTGCTCCGCCGAAACCGAGCCGCCGATATCAGTGAGCGGCCCGTACACGCACTCATCGCTGGTCTCGTGATGGTCGCCCTCGGCGTTCGGCGCGACGAAGAAATGCAGGTTGCCGTCGGCAACGTGACCTATCGTATAGCACTCACCGCGCGGCCAGCGCCCCCTGACATGGGTTTTCACCGCCTCGACGTAAGCGGCCATGCTGCGAATCGGCAGGCTGACATCGTACAAATAGACCGG

How to fix it?

vinisalazar commented 1 year ago

Having this same issue with versions 1.1.4 and 1.1.6.

dportik commented 11 months ago

Bumping this issue as I am having a similar problem:

DAS Tool 1.1.6 
Analyzing assembly 
Warning message:
In calc_N50(contigTab[, contig_id], contigTab[, contig_length]) :
  integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
Predicting genes 
Annotating single copy genes using diamond 
Dereplicating, aggregating, and scoring bins 
Error in `[.data.table`(bin_tab_contig, , .(binSize = calc_bins_size(contig_id,  : 
  Column 1 of result for group 2 is type 'double' but expecting type 'integer'. Column types must be consistent for each group.
Calls: cherry_pick -> score_bins -> %>% -> setkey -> [ -> [.data.table
In addition: Warning message:
In calc_N50(contig_id, contig_length) :
  integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
Execution halted
vinisalazar commented 11 months ago

@cmks I understand that this time of the year is quite busy, but would it be possible to give some attention to this issue for the next version of DAS Tool? Please let us know if there's any way to help.

Many thanks Vini

cmks commented 11 months ago

Hi all, thanks for reporting this bug. I've just pushed a fix but I can't test it as I'm not able to replicate the issue. Can you please re-run your data using the new version and tell me if it is working? You can either install the pre-release: DAS Tool 1.1.7-b.1 or checkout this branch: issue_99

vinisalazar commented 11 months ago

Thank you @cmks, not sure if this is of any help, but I seem to only get this issue with fairly large datasets. It hasn't happened with smaller datasets.

vinisalazar commented 11 months ago

Hi @cmks, coming in to report that the fix seems to have worked. We are no longer getting that error and DAS Tool finished successfully. Thank you for your responsiveness!

BTW, our only struggle was installing from source. We eventually figured it out, but it took some time to find and replace the DAS_Tool.R file in the R library directory (we had a conda-based installation).

Best, Vini

cmks commented 10 months ago

Great, thanks for your feedback! The fix is now part of version 1.1.7 in case you want to update your conda-based installation.