blobtoolkit / blobtoolkit-docker

[Archived] Docker images for BlobToolKit
1 stars 1 forks source link

Blobtools add error #2

Closed AmaliT closed 4 years ago

AmaliT commented 4 years ago

Hi @rjchallis

I have pulled down the container and package into a singularity and managed to create dataset successfully. But when I try to add data such as blast results I get the following error. This seems to only occur when I try to set the -taxrule bestsum. Are you able to advise me why this might be?

[Co_Assemblies_20]$ singularity exec ${BTOOLS} blobtools add    \
 --hits 13.host_filter/MG_HN.msp.1.5.ncbi.blastn.out    \
 --hits 13.host_filter/MG_HN.msp.1.5.diamond.blastx.unip.out    \
 --taxrule bestsum  \
 --taxdump taxdump 13.host_filter/MG_HN
Parsing taxdump
bestsum
Traceback (most recent call last):
  File "/blobtoolkit/blobtools2/lib/add.py", line 165, in <module>
    main()
  File "/blobtoolkit/blobtools2/lib/add.py", line 132, in main
    meta=meta)
  File "/blobtoolkit/blobtools2/lib/hits.py", line 311, in parse
    identifiers)
  File "/blobtoolkit/blobtools2/lib/hits.py", line 71, in apply_taxrule
    blank = [None] * len(identifiers.values)
AttributeError: 'list' object has no attribute 'values'

Thanks in advance

Cheers Amali

rjchallis commented 4 years ago

Hi Amali

This could be caused by the identifiers being missing from the dataset. The blobtools add/create command occasionally seems to remove the identifiers, but I've not been able to reproduce behaviour this to debug it. If this has happened here then your BlobDir dataset directory probably has no identifiers.json file.

Removing the directory and re-running the blobtools create step is the best way to fix it, and may work even if you have encountered a different problem.

Hope this helps

AmaliT commented 4 years ago

Hi @rjchallis

I had a look; identifiers.json file was there. Also tried redoing after removal of the directory; still the same error; but works fine if I dont have the taxrule flag. Does the fasta headers need to be a specific way?

[Co_Assemblies_20]$ ls 13.host_filter/MG_HN
 gc.json  identifiers.json  length.json  meta.json  ncount.json
[hraaxt@aklppf31:Co_Assemblies_20]$ head 13.host_filter/MG_HN/identifiers.json
{
 "values": [
  "NODE_1_length_118283_cov_1276.514227",
  "NODE_2_length_96275_cov_1083.140284",
  "NODE_3_length_70478_cov_1275.924791",
  "NODE_4_length_48545_cov_42.141972",
  "NODE_5_length_41497_cov_9.051269",
  "NODE_6_length_41347_cov_1209.736002",
  "NODE_7_length_39960_cov_7.390782",
  "NODE_8_length_39895_cov_1.417723",
[hraaxt@aklppf31:Co_Assemblies_20]$ tail 13.host_filter/MG_HN/identifiers.json
  "NODE_121981_length_1500_cov_0.390386",
  "NODE_121982_length_1500_cov_0.358339",
  "NODE_121983_length_1500_cov_0.313183",
  "NODE_121984_length_1500_cov_0.309541",
  "NODE_121985_length_1500_cov_0.306628"
 ],
 "keys": [

 ]
}

Cheers A

rjchallis commented 4 years ago

Thanks for checking the identifiers were OK. The naming format shouldn't matter so they look fine. I've managed to reproduce this and think I've spotted the bug. I should get it fixed and update the container image tomorrow.

rjchallis commented 4 years ago

I've just updated the Blobtoolkit container image with a bug fix for --taxrule bestsum.

If you singularity pull docker://genomehubs/blobtoolkit:1.3.3 you should be able to run the command that was failing.

AmaliT commented 4 years ago

Great, thanks @rjchallis