MrOlm / drep

Rapid comparison and dereplication of genomes
244 stars 36 forks source link

Test_backend folder missing for test_suite.py #13

Closed jeffkimbrel closed 6 years ago

jeffkimbrel commented 7 years ago

I'm checking to see if dRep installed correctly, so I am running the test_suite.py script from the tests folder. I get an error:

FileNotFoundError: [Errno 2] No such file or directory: '/XXX/XXX/drep/tests/../tests/test_backend/ecoli_wd'

Indeed, this isn't in the tests folder. Is there a work around to get the test scripts to work?

Thanks.

jeffkimbrel commented 7 years ago

I see this is similar to #11 . I'll close and wait to see if the test_suite.py gets fixed in the future.

MrOlm commented 7 years ago

Hi Jeff,

Sorry about this inconvenience. All that's needed is a blank folder in the tests folder called test_backend.

You can either make it yourself, or download the new version (1.1.2) which will include it.

Best, -Matt

jeffkimbrel commented 7 years ago

Thanks, that worked to get it started. I did get an error, however:

Traceback (most recent call last):
  File "test_suite.py", line 633, in <module>
    test_short()
  File "test_suite.py", line 617, in test_short
    cluster_test()
  File "test_suite.py", line 581, in cluster_test
    verifyCluster.run()
  File "test_suite.py", line 328, in run
    self.skipsecondary_test()
  File "test_suite.py", line 367, in skipsecondary_test
    assert compare_dfs(db1, db2), "{0} is not the same!".format('Mdb')
AssertionError: Mdb is not the same!

It isn't clear to me at which calculation this failed, but it was step4 after functional test 1 passed.

Let me know if I should open this up in a new issue.

Jeff

MrOlm commented 7 years ago

Hmm. What does the following command show? It could be that one of the dependencies is not working

$ dRep bonus test --check_dependencies

On Aug 25, 2017, at 3:44 PM, Jeff Kimbrel notifications@github.com wrote:

Thanks, that worked to get it started. I did get an error, however:

Traceback (most recent call last): File "test_suite.py", line 633, in test_short() File "test_suite.py", line 617, in test_short cluster_test() File "test_suite.py", line 581, in cluster_test verifyCluster.run() File "test_suite.py", line 328, in run self.skipsecondary_test() File "test_suite.py", line 367, in skipsecondary_test assert compare_dfs(db1, db2), "{0} is not the same!".format('Mdb') AssertionError: Mdb is not the same! It isn't clear to me at which calculation this failed, but it was step4 after functional test 1 passed.

Let me know if I should open this up in a new issue.

Jeff

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MrOlm/drep/issues/13#issuecomment-325050178, or mute the thread https://github.com/notifications/unsubscribe-auth/AMzWTT7Ic3h2vxOH9YO2mcOJnfatGQuZks5sb05ggaJpZM4PB_wb.

jeffkimbrel commented 6 years ago

Yep, it looks like there are dependencies missing, but the missing ones are listed as optional. I suppose that the test python script uses these dependencies?

$ dRep bonus test --check_dependencies
Loading work directory
Checking dependencies
mash.................................... all good        (location = /usr/local/bin/mash)
nucmer.................................. all good        (location = .../scripts/MUMmer3.23/nucmer)
checkm.................................. !!! ERROR !!!   (location = None)
ANIcalculator........................... !!! ERROR !!!   (location = None)
prodigal................................ all good        (location = .../scripts/prodigal)
centrifuge.............................. !!! ERROR !!!
MrOlm commented 6 years ago

Hmm- yes some of those dependencies are used by the test suite, but with the ones you have working I wouldn't really expect it to crash there.

Is the program working when you use it on your own data?

jeffkimbrel commented 6 years ago

It seems like I keep running into problems with checkM not installed. I am trying to run all of this on my laptop, and therefore don't have checkM because it requires >16gb of memory. I actually do have it installed, but don't have all of the data files downloaded.

Running on my own files...

$ dRep dereplicate_wf dRep_test -g bins/*.fasta --skipCheckM

When I use the --skipCheckM flag I can get through the Filtering and Clustering steps, but it fails on the Choose step after it also attempts to run checkM (disregarding the flag).

Also, I manage python environments using Anaconda rather than pyenv. My default python is 3.4.5. I wonder if that is also messing something up... I think checkM is the only thing requiring python 2.X, correct?

Thanks for your help.

MrOlm commented 6 years ago

So a problem is that the choose module does require checkm. This is because that's how it decides which bin is the "best" from each cluster.

What is your goal with dRep? If your goal is to figure out which genomes are similar, I would suggest using the "compare_wf" option. If your goal is to generate a representative genome list, I would be happy to work with you to try and figure out a way to get checkM running.

-Matt

On Sep 7, 2017, at 1:41 PM, Jeff Kimbrel notifications@github.com wrote:

It seems like I keep running into problems with checkM not installed. I am trying to run all of this on my laptop, and therefore don't have checkM because it requires >16gb of memory. I actually do have it installed, but don't have all of the data files downloaded.

$ dRep dereplicate_wf dRep_test -g bins/*.fasta --skipCheckM

When I use the --skipCheckM flag I can get through the Filtering and Clustering steps, but it fails on the Choose step after it also attempts to run checkM (disregarding the flag).

Also, I manage python environments using Anaconda rather than pyenv. My default python is 3.4.5. I wonder if that is also messing something up... I think checkM is the only thing requiring python 2.X, correct?

Thanks for your help.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MrOlm/drep/issues/13#issuecomment-327921753, or mute the thread https://github.com/notifications/unsubscribe-auth/AMzWTf4ezjcxQeHHcFoLrFle3Xvjusgjks5sgFT3gaJpZM4PB_wb .

jeffkimbrel commented 6 years ago

I have figured out how to run checkM on NERSC... is it possible to use an "external" checkM results dataset with dRep?

My goals are pretty much what the advertised purpose is. I have tons of metagenomes that would be too computationally expensive to combine and co-assemble. So I want to take bins from either single metagenomes, or co-assembled replicates, and "merge" the bins.

MrOlm commented 6 years ago

Yes- there is a way to use "external" checkM results.

When using the dereplicate_wf (probably what you want), there's an option --Chdb which can be used to provide external checkM results.

They need to be in the --table_table format, though. The checkM command to generate this is:

checkm qa --tab_table -o 2

An example of how it should look is attached.

Finally, make sure that for the "Bid Id" column, you have the name of the genome WITH the file extension, and WITHOUT the path to the genome (as is the case in the example sheet provided).

Best, -Matt

Chdb.csv.zip