Closed lkothera closed 5 years ago
Hi Linda,
I'm sorry to hear about the issue. I haven't seen this before, and it's not obvious to me what the cause of the problem is if you're able to see that the .gz
file is there. It may have something to do with how CATCH was installed on your platform.
Can you start by running ls -l /apps/x86_64/python/3.6.1/lib/python3.6/site-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz
and pasting the results, so I can see the file size? If the size is small, it may consist of only the hash and suggest the data has not been pulled via git lfs pull
, although I'm not sure if this would yield the FileNotFoundError.
Hi Hayden, Thanks for getting back to me. I am unable to get that line of code you pasted below to work. I typed it a couple of times and got “No such file or directory”. The way I think I can see the file size for the zika.fasta.gz file was to type
ls -l /apps/x86_64/catch/catch/catch/datasets/data
I did the cd command along the way. Not sure if that matters.
And the result line for Zika is
-rwxr-xr-x. 1 root root 764974 2019-03-26 13:53 zika.fasta.gz
There seems to be a lot to the path that the program wants to take. Let me know please if it’s something we need to fix on our end.
Thanks, Linda
Linda Kothera, PhD Ecology and Entomology Team Arboviral Diseases Branch Division of Vector-Borne Diseases Center for Emerging Zoonotic Infectious Diseases Centers for Disease Control and Prevention 3156 Rampart Road Fort Collins, CO 80521 970-225-4216 lkothera@cdc.gov
From: Hayden Metsky notifications@github.com Sent: Wednesday, March 27, 2019 9:25 AM To: broadinstitute/catch catch@noreply.github.com Cc: Kothera, Linda (CDC/DDID/NCEZID/DVBD) fph6@cdc.gov; Author author@noreply.github.com Subject: Re: [broadinstitute/catch] Having trouble accessing preloaded datasets (#26)
Hi Linda,
I'm sorry to hear about the issue. I haven't seen this before, and it's not obvious to me what the cause of the problem is if you're able to see that the .gz file is there. It may have something to do with how CATCH was installed on your platform.
Can you start by running ls -l /apps/x86_64/python/3.6.1/lib/python3.6/site-packages/cat ch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz and pasting the results, so I can see the file size? If the size is small, it may consist of only the hash and suggest the data has not been pulled via git lfs pull, although I'm not sure if this would yield the FileNotFoundError.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/catch/issues/26#issuecomment-477207858, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ArpLx-fwe1ADg_JSu4H5rlbb5ZJUhYoxks5va41SgaJpZM4cMPcA.
I'm not certain, but based on the path you provided (containing an egg) it looks like CATCH may have been installed by your team using _easyinstall, which I haven't used or tested. As noted in the README, I'd recommend pip -- in particular (but optionally), from within a virtual environment. Installing via conda is another option.
It looks like the design.py
on your PATH is in a different directory than where the data lives. One quick fix might be to try running python /apps/x86_64/catch/bin/design.py zika ...
, instead of design.py zika ...
. Can you let me know if that works?
That does not seem to work. Here’s the code and the error:
fph6@biolinux> python /apps/x86_64/catch/bin/design.py zika -pl 75 -m 2 -l 60 -e 50 -o zika-probes.fasta --verbose python: can't open file '/apps/x86_64/catch/bin/design.py': [Errno 2] No such file or directory
I then did this (added an extra /catch to the line): fph6@biolinux> python /apps/x86_64/catch/catch/bin/ design.py zika -pl 75 -m 2 -l 60 -e 50 -o zika-probes.fasta –verbose
Which returned this after three lines of other output: /apps/x86_64/python/3.6.1/bin/python: can't find 'main' module in '/apps/x86_64/catch/catch/bin/'
I looked around for main and can’t find it. Here’s what’s in '/apps/x86_64/catch/catch/bin/'
fph6@biolinux> cd /apps/x86_64/catch/catch/bin fph6@biolinux> ls analyze_probe_coverage.py design_naively.py design.py pool.py
Linda
From: Hayden Metsky notifications@github.com Sent: Wednesday, March 27, 2019 11:46 AM To: broadinstitute/catch catch@noreply.github.com Cc: Kothera, Linda (CDC/DDID/NCEZID/DVBD) fph6@cdc.gov; Author author@noreply.github.com Subject: Re: [broadinstitute/catch] Having trouble accessing preloaded datasets (#26)
I'm not certain, but based on the path you provided (containing an egg) it looks like CATCH may have been installed by your team using easy_install, which I haven't used or tested. As noted in the READMEhttps://github.com/broadinstitute/catch#downloading-and-installing, I'd recommend pip -- in particular (but optionally), from within a virtual environment. Installing via condahttps://github.com/broadinstitute/catch#alternative-approach-installing-with-conda is another option.
It looks like the design.py on your PATH is in a different directory than where the data lives. One quick fix might be to try running python /apps/x86_64/catch/bin/design.py zika ..., instead of design.py zika .... Can you let me know if that works?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/catch/issues/26#issuecomment-477276118, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ArpLx1SOls5mK7J-mVFOPfUTbZyCdWdwks5va65ugaJpZM4cMPcA.
There's a space in python /apps/x86_64/catch/catch/bin/ design.py
between bin/
and design.py
. Does it work if you run it without that space?
It does not seem to work when I do that.
fph6@biolinux> python /apps/x86_64/catch/catch/bin/ design.py zika -pl 75 -m 2 -l 60 -e 50 -o zika-probes.fasta -- verbose /apps/x86_64/python/3.6.1/bin/python: can't find 'main' module in '/apps/x86_64/catch/catch/bin/'
There is a space between python and /apps and another between bin/ and design.py.
Does it matter what directory I’m in?
From: Hayden Metsky notifications@github.com Sent: Wednesday, March 27, 2019 1:12 PM To: broadinstitute/catch catch@noreply.github.com Cc: Kothera, Linda (CDC/DDID/NCEZID/DVBD) fph6@cdc.gov; Author author@noreply.github.com Subject: Re: [broadinstitute/catch] Having trouble accessing preloaded datasets (#26)
There's a space in python /apps/x86_64/catch/catch/bin/ design.py between bin/ and design.py. Does it work if you run it without that space?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/catch/issues/26#issuecomment-477308995, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ArpLx9EWtFuSP0568ZQryE7NkyKgG8cXks5va8KQgaJpZM4cMPcA.
Geez. I misread your email. Hang on.
From: Hayden Metsky notifications@github.com Sent: Wednesday, March 27, 2019 1:12 PM To: broadinstitute/catch catch@noreply.github.com Cc: Kothera, Linda (CDC/DDID/NCEZID/DVBD) fph6@cdc.gov; Author author@noreply.github.com Subject: Re: [broadinstitute/catch] Having trouble accessing preloaded datasets (#26)
There's a space in python /apps/x86_64/catch/catch/bin/ design.py between bin/ and design.py. Does it work if you run it without that space?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/catch/issues/26#issuecomment-477308995, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ArpLx9EWtFuSP0568ZQryE7NkyKgG8cXks5va8KQgaJpZM4cMPcA.
OK, here’s the code and results without the space between bin/ and design.py
fph6@biolinux> python /apps/x86_64/catch/catch/bin/design.py zika -pl 75 -m 2 -l 60 -e 50 -o zika-probes.fasta --verbose
2019-03-27 16:10:50,320 - catch.utils.seq_io [INFO] Reading fasta file /apps/x86_64/python/3.6.1/lib/python3.6/sit
e-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz
Traceback (most recent call last):
File "/apps/x86_64/catch/catch/bin/design.py", line 811, in
From: Hayden Metsky notifications@github.com Sent: Wednesday, March 27, 2019 1:12 PM To: broadinstitute/catch catch@noreply.github.com Cc: Kothera, Linda (CDC/DDID/NCEZID/DVBD) fph6@cdc.gov; Author author@noreply.github.com Subject: Re: [broadinstitute/catch] Having trouble accessing preloaded datasets (#26)
There's a space in python /apps/x86_64/catch/catch/bin/ design.py between bin/ and design.py. Does it work if you run it without that space?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/catch/issues/26#issuecomment-477308995, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ArpLx9EWtFuSP0568ZQryE7NkyKgG8cXks5va8KQgaJpZM4cMPcA.
Unfortunately, I think this is going to be tough to resolve given how it was installed. As I mentioned earlier, because of the egg file in the site-packages
directory, I suspect that CATCH was installed using Distutils (python setup.py install
) or with easy_install
. I have not tested it this way, and can't recommend it. The basic problem, when installing this way, is that the installation is copying Python files into the egg file, but not the data -- and consequently the Python modules are unable to locate the data, which would normally be in the same directory structure. These installation methods should be fine if you do not plan to use the data distributed with CATCH, so you could alternatively move on to just use your own input FASTA files.
I think this will be easiest to resolve by asking your compute team if they could reinstall CATCH, using pip, as recommended in the README: via pip install -e .
or pip install --user -e .
. (Either way, the -e
is needed to use the data distributed with the package.) It would also be helpful if they could run the test suite, as described in the README, to verify that everything is working correctly.
Yes, it sounds like that is what’s needed. Thank you again for the assistance.
Also, I have a couple of questions about wet lab work from your recent paper. Would someone be able to help me with some details of the steps and reagents involved between using the hybridization probes and using the MiSeq reagent kit? If so can I get the proper contact info?
Linda
From: Hayden Metsky notifications@github.com Sent: Thursday, March 28, 2019 10:14 AM To: broadinstitute/catch catch@noreply.github.com Cc: Kothera, Linda (CDC/DDID/NCEZID/DVBD) fph6@cdc.gov; Author author@noreply.github.com Subject: Re: [broadinstitute/catch] Having trouble accessing preloaded datasets (#26)
Unfortunately, I think this is going to be tough to resolve given how it was installed. As I mentioned earlier, because of the egg file in the site-packages directory, I suspect that CATCH was installed using Distutils (python setup.py install) or with easy_install. I have not tested it this way, and can't recommend it. The basic problem, when installing this way, is that the installation is copying Python files into the egg file, but not the data -- and consequently the Python modules are unable to locate the data, which would normally be in the same directory structure. These installation methods should be fine if you do not plan to use the data distributed with CATCH, so you could alternatively move on to just use your own input FASTA files.
I think this will be easiest to resolve by asking your compute team if they could reinstall CATCH, using pip, as recommended in the READMEhttps://github.com/broadinstitute/catch/blob/master/README.md: via pip install -e . or pip install --user -e .. (Either way, the -e is needed to use the data distributed with the package.) It would also be helpful if they could run the test suite, as described in the README, to verify that everything is working correctly.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/broadinstitute/catch/issues/26#issuecomment-477665041, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ArpLxzIyDSMnhnvxLx4x_5LN1u19zHSBks5vbOpJgaJpZM4cMPcA.
Yes, of course. Katie Siddle (kjsiddle@broadinstitute.org), my co-first author on the paper, is the right person to reach out to about those questions. Or you can email me (hayden@mit.edu) and I'll pass them along.
Thank you!
Hi, novice Linux user here.
I work for the CDC and our scientific computing people have installed CATCH on our biolinux platform. I have loaded CATCH and was trying to run the line of code to have the program make probes for the installed Zika virus data set. I'm getting error messages that seem to say the .gz file can't be found, although if I move around the directories, I can see the .gz file that is supposed to be used to generate the probe designs.
Here is the line of code and the error messages: fph6@biolinux> design.py zika -pl 75 -m 2 -l 60 -e 50 -o zika-probes.fasta --verbose 2019-03-26 15:09:11,298 - catch.utils.seq_io [INFO] Reading fasta file /apps/x86_64/python/3.6.1/lib/python3.6/sit e-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz Traceback (most recent call last): File "/apps/x86_64/catch/catch/bin/design.py", line 811, in
main(args)
File "/apps/x86_64/catch/catch/bin/design.py", line 60, in main
genomes_grouped += [seq_io.read_dataset_genomes(dataset)]
File "/apps/x86_64/python/3.6.1/lib/python3.6/site-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/utils
/seq_io.py", line 71, in read_dataset_genomes
seqs = list(read_fasta(fn).values())
File "/apps/x86_64/python/3.6.1/lib/python3.6/site-packages/catch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/utils
/seq_io.py", line 152, in read_fasta
with gzip.open(fn, 'rt') as f:
File "/apps/x86_64/python/3.6.1/lib/python3.6/gzip.py", line 53, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "/apps/x86_64/python/3.6.1/lib/python3.6/gzip.py", line 163, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/apps/x86_64/python/3.6.1/lib/python3.6/site-packages/cat
ch-v1.2.0_20_gbf97305_dirty-py3.6.egg/catch/datasets/data/zika.fasta.gz'
Can you help? Thanks, Linda