gjeunen / reference_database_creator

creating reference databases for amplicon sequencing
MIT License
28 stars 8 forks source link

CRABS in Docker #30

Closed lepatrick1 closed 1 year ago

lepatrick1 commented 1 year ago

Apologies in advance; I'm sure my issue is pretty pathetic, but I'm using Docker for the first time to try to run CRABS and I've not done much with the Command Prompt before. I'm having trouble getting CRABS to run on Docker.

I've gotten Docker up and running on Windows 10 and have managed to load the welcome and getting started images and made them into containers. I then used Command Prompt to install CRABS, which I can then see as an image on Docker. However, when I run CRABS in Docker or use the Command Prompt to run the first three lines of help code from your tutorial, nothing happens and I get the same error message.

Command Prompt: C:\Users\lorel>docker run --rm -it quay.io/swordfish/crabs:0.1.4 Traceback (most recent call last): File "/usr/local/bin/crabs", line 1428, in main() File "/usr/local/bin/crabs", line 1425, in main args.func(args) AttributeError: 'Namespace' object has no attribute 'func'

Docker (when trying to run from image to container): 2023-07-03 14:43:22 Traceback (most recent call last): 2023-07-03 14:43:22 File "/usr/local/bin/crabs", line 1428, in 2023-07-03 14:43:22 main() 2023-07-03 14:43:22 File "/usr/local/bin/crabs", line 1425, in main 2023-07-03 14:43:22 args.func(args) 2023-07-03 14:43:22 AttributeError: 'Namespace' object has no attribute 'func'

I've tried deleting everything and reloading/reinstalling several times, but still have the same issue.

Are you able to tell me what I've done wrong? Do I need to provide additional information? Is there a tutorial you can point me to?

Thanks in advance!

gjeunen commented 1 year ago

Hello @lepatrick1,

Apologies for the issues you're running into. As I'm not that familiar with Docker itself, I'll refer you to @hughcross who developed the Docker installation. @hughcross, could you please have a look at the error message to determine what's going on?

Thanks, Gert-Jan

hughcross commented 1 year ago

Hi @lepatrick1 ,

Thanks for trying out Crabs! First off, your question is not pathetic. Docker can be tricky and many developers build their docker images differently, so you sometimes have to adjust for new programs. I also notice that I have not yet added a Windows tutorial. I will try to do that by the end of the summer, but feel free to ask questions here and we will try to get back to you.

I was able to replicate your error on my Windows machine. I believe the problem is that you did not enter a command after the image. The quay.io/swordfish/crabs:0.1.4 part just finds the image, but you still need a crabs command to do something. For example, to get the help command:

docker run --rm -it quay.io/swordfish/crabs:0.1.4 crabs -h 

notice the crabs -h after the image name. Some docker images don't require that you put the command after, but ours does. You should be able to follow most of the mac tutorial, but for Windows terminal you change the \ to a ` (the key next to the 1 key on most keyboards). As an example:

docker run --rm -it `
  -v $(pwd):/data `
  --workdir="/data" `
  quay.io/swordfish/crabs:0.1.4 `
  crabs db_download `
  --source taxonomy

That should work, but let me know. Notice in this last command that you still have the crabs command crabs db_download after the image name.

Hopefully, this will help you get started. Let us know how it goes. One other thing that helps me with Windows is that I use Visual Studio Code and run commands from the terminal within that program. I am not as familiar with Windows but using VS Code (as I do on all operating systems) helps. But there are many options.

lepatrick1 commented 1 year ago

Thanks for the information! I've managed to get through the tutorial to the point of trying to output the taxonomy table (the last step of the tutorial before the visualizations). I'm now using Visual Studio Code. It throws an error when I use

-v $(pwd):/data `

so instead I've been using

--volume C:\Users\lorel\reference_database_creator:/data `

I also can't get the following code to work:

PS C:\Users\lorel\reference_database_creator> TAX= 'C:\Users\lorel\reference_database_creator\taxonomy_files'
TAX= : The term 'TAX=' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was 
included, verify that the path is correct and try again.
At line:1 char:1
+ TAX= 'C:\Users\lorel\reference_database_creator\taxonomy_files'
+ ~~~~
    + CategoryInfo          : ObjectNotFound: (TAX=:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

I've tried modifying the code to get the taxonomy table to output, which is where I'm currently stuck:

PS C:\Users\lorel\reference_database_creator> docker run --rm -it `
>>   --volume C:\Users\lorel\reference_database_creator:/data `
>>   --volume C:\Users\lorel\reference_database_creator:/src `
>>   --workdir="/data" `
>>   --cpus 4 `
>>   quay.io/swordfish/crabs:0.1.4 `
>>   crabs assign_tax `
>>     --input amanita_its1pga.fasta `
>>     --output amanita_its1.tsv `
>>     --acc2tax /src/nucl_gb.accession2taxid `
>>     --taxid /src/nodes.dmp `
>>     --name /src/names.dmp

retrieving accession numbers from amanita_its1pga.fasta 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 760775/793213 [00:00<00:00, 12796342.77it/s]found 2947 accession numbers in amanita_its1pga.fasta reading /src/nucl_gb.accession2taxid into memory Traceback (most recent call last): File "/usr/local/bin/crabs", line 1428, in main() File "/usr/local/bin/crabs", line 1425, in main args.func(args) File "/usr/local/bin/crabs", line 511, in assign_tax acc2tax, taxid, name, no_acc = tax2dict(ACC2TAX, TAXID, NAME, accession) File "/usr/local/bin/function/module_assign_tax.py", line 14, in tax2dict with tqdm(total = os.path.getsize(acc2tax)) as pbar: File "/usr/lib/python3.8/genericpath.py", line 50, in getsize return os.stat(filename).st_size FileNotFoundError: [Errno 2] No such file or directory: '/src/nucl_gb.accession2taxid'

The last line of the error message keeps coming up regardless of how I tweak the code (as long as I haven't messed the code up so badly it doesn't run at all, which has absolutely happened).

Any ideas on how to get past this error?

Thanks!

Lorelei

hughcross commented 1 year ago

Hi @lepatrick1 ,

Let me try to help. I will address your problems in turn. The first problem is that using $(pwd) won't work in Windows. I believe that is just linux/mac. I should have caught that. Sorry. It looks like you solved that. I will check for the Windows shell alternative.

For the second, It looks like you are trying to assign the variable TAX to the folder where the taxonomy files are, as in the tutorial. You have a space between TAX= and the folder. there should be no space. Try:

TAX='C:\Users\lorel\reference_database_creator\taxonomy_files'

In any case, it looks like you figured it out without the variable, so that leaves your last command. I think the problem there is that in the second --volume argument you point to the main folder, but it seems you downloaded the taxonomy files to a subfolder called taxonomy_files. Crabs can't find these files as a result. Try this:

PS C:\Users\lorel\reference_database_creator> docker run --rm -it `
   --volume C:\Users\lorel\reference_database_creator:/data `
   --volume C:\Users\lorel\reference_database_creator\taxonomy_files:/src `
   --workdir="/data" `
   --cpus 4 `
   quay.io/swordfish/crabs:0.1.4 `
   crabs assign_tax `
     --input amanita_its1pga.fasta `
     --output amanita_its1.tsv `
     --acc2tax /src/nucl_gb.accession2taxid `
     --taxid /src/nodes.dmp `
     --name /src/names.dmp

If you downloaded your taxonomy files to that subfolder then it should work. Check the names of the downloaded files.

Let me know how that goes.

I am sorry I haven't had a chance to add more information for Windows users. I won't have a chance to do a full tutorial until the fall (northern hemisphere). Thank you for your patience. It looks like you have done pretty well figuring things out, but don't hesitate to ask. Your input helps us as well!

Cheers,

Hugh

lepatrick1 commented 1 year ago

Hugh,

Thanks again for getting back to me!

The code you sent to make the TAX variable did not work:

PS C:\Users\lorel\reference_database_creator> TAX='C:\Users\lorel\reference_database_creator\taxonomy_files' TAX=C:\Users\lorel\reference_database_creator\taxonomy_files : The term 'TAX=C:\Users\lorel\reference_database_creator\taxonomy_files' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try
again. At line:1 char:1

At this point, I don't care about that, just wanted you to know.

I still can't get the last step to work regardless of where on my computer the taxonomy files are; at this point I have copied them to several folders on my computer to ensure that isn't the problem. I've also checked the spelling of the nucl_gb.accession2taxid and tried adding .gz (which caused a utf8 error) with no luck:

PS C:\Users\lorel\reference_database_creator> docker run --rm -it `

--volume C:\Users\lorel\reference_database_creator:/data --volume C:\Users\lorel\reference_database_creator:/src --workdir="/data" --cpus 4 quay.io/swordfish/crabs:0.1.4 crabs assign_tax --input amanita_its1pga.fasta --output amanita_its1.tsv --acc2tax /src/nucl_gb.accession2taxid --taxid /src/nodes.dmp --name /src/names.dmp

retrieving accession numbers from amanita_its1pga.fasta 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 761617/794088 [00:00<00:00, 12340848.17it/s]found 2950 accession numbers in amanita_its1pga.fasta reading /src/nucl_gb.accession2taxid into memory Traceback (most recent call last): File "/usr/local/bin/crabs", line 1428, in main() File "/usr/local/bin/crabs", line 1425, in main args.func(args) File "/usr/local/bin/crabs", line 511, in assign_tax acc2tax, taxid, name, no_acc = tax2dict(ACC2TAX, TAXID, NAME, accession) File "/usr/local/bin/function/module_assign_tax.py", line 14, in tax2dict with tqdm(total = os.path.getsize(acc2tax)) as pbar: File "/usr/lib/python3.8/genericpath.py", line 50, in getsize return os.stat(filename).st_size FileNotFoundError: [Errno 2] No such file or directory: '/src/nucl_gb.accession2taxid'

Any additional advice?

Thanks,

Lorelei

lepatrick1 commented 1 year ago

Hugh (@hughcross),

New issue but while attempting to do the same thing. I've managed to download the sequences I need for my project and am now trying to assign taxonomy and getting this error message about "acc2tax":

PS C:\Users\lorel\reference_database_creator> docker run --rm -it `

--volume C:\Users\lorel\reference_database_creator:/data --workdir="/data" --cpus 4 quay.io/swordfish/crabs:0.1.4 crabs assign_tax --input Insecta_CO1pga.fasta --output Insecta_its1.tsv --acc2tax nucl_gb.accession2taxid --taxid nodes.dmp --name names.dmp --missing missing_taxa.tsv At line:9 char:8

  • --acc2tax nucl_gb.accession2taxid `
  • ~ Missing expression after unary operator '--'. At line:9 char:8
  • --acc2tax nucl_gb.accession2taxid `
  • 
    Unexpected token 'acc2tax' in expression or statement.
  • FullyQualifiedErrorId : MissingExpressionAfterOperator

PS C:\Users\lorel\reference_database_creator> docker run --rm -it `

--volume C:\Users\lorel\reference_database_creator:/data --volume C:\Users\lorel\reference_database_creator:/src --workdir="/data" --cpus 4 quay.io/swordfish/crabs:0.1.4 crabs assign_tax --input Insecta_CO1pga.fasta --output Insecta_its1.tsv --acc2tax /src/nucl_gb.accession2taxid --taxid /src/nodes.dmp --name /src/names.dmp ` --missing missing_taxa.tsv At line:10 char:8

  • --acc2tax /src/nucl_gb.accession2taxid `
  • ~ Missing expression after unary operator '--'. At line:10 char:8
  • --acc2tax /src/nucl_gb.accession2taxid `
  • 
    Unexpected token 'acc2tax' in expression or statement.
  • CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
  • FullyQualifiedErrorId : MissingExpressionAfterOperator

This happens regardless of if I use the src folder (what's the purpose of using this folder?)

This should be the last step I need before I'm ready to move forward with my project so any help or advice would be great! TIA!

lepatrick1 commented 1 year ago

I don't know what you all did, but if finally worked! Thanks!

PS C:\Users\lorel\reference_database_creator> docker run --rm -it `

--volume C:\Users\lorel\reference_database_creator:/src --workdir="/src" --cpus 4 quay.io/swordfish/crabs:0.1.4 crabs assign_tax --input Insecta_CO1pga.fasta --output Insecta_its1.tsv --acc2tax /src/nucl_gb.accession2taxid --taxid /src/nodes.dmp --name /src/names.dmp --missing missing_taxa.tsv

retrieving accession numbers from Insecta_CO1pga.fasta 95%|███████████████████████████████████████████████████████████████████████████████▋ | 25680600/27062808 [00:02<00:00, 10691597.59it/s]found 116612 accession numbers in Insecta_CO1pga.fasta reading /src/nucl_gb.accession2taxid into memory 100%|██████████████████████████████████████████████████████████████████████████████| 11932025964/11932025964 [12:35<00:00, 15792684.67it/s]reading /src/nodes.dmp into memory 100%|███████████████████████████████████████████████████████████████████████████████████| 190266827/190266827 [00:19<00:00, 9819409.88it/s]reading /src/names.dmp into memory 100%|██████████████████████████████████████████████████████████████████████████████████| 233742549/233742549 [00:20<00:00, 11536535.96it/s]processed 116612 entries in /src/nucl_gb.accession2taxid processed 2516767 entries in /src/nodes.dmp processed 2516767 entries in /src/names.dmp assigning a tax ID number to 116612 accession numbers from Insecta_CO1pga.fasta did not find 0 accession numbers in /src/nucl_gb.accession2taxid 116612 accession numbers resulted in 21364 unique tax ID numbers generating taxonomic lineages for 21364 tax ID numbers assigning a taxonomic lineage to 116612 accession numbers written 116612 entries to Insecta_its1.tsv 95%|███████████████████████████████████████████████████████████████████████████████▋ | 25680600/27062808 [00:02<00:00, 11543692.44it/s]writting 0 sequences with missing taxonomic info to missing_taxa.tsv