Closed rbharath closed 4 years ago
Any tips on how to install the GTC notebook pre-reqs? I get a missing package error in GTC_workshop_tutorial (launched jupyter from the root directory of the repo and navigated thru to contrib/dragonn in browser):
ModuleNotFoundError Traceback (most recent call last)
I can post installation instructions tomorrow. How about a 'dragonn' conda environment file to get past prereq conflicts?
-J
On Wed, Jan 3, 2018, 11:57 AM Joe notifications@github.com wrote:
Any tips on how to install the GTC notebook pre-reqs? I get a missing package error in GTC_workshop_tutorial (launched jupyter from the root directory of the repo and navigated thru to contrib/dragonn in browser):
ModuleNotFoundError Traceback (most recent call last) in () ----> 1 from simulations import simulate_motif_density_localization 2 print(simulate_motif_density_localization.doc)
~/deepchem-fork/contrib/dragonn/simulations.py in () 2 from collections import OrderedDict 3 import numpy as np ----> 4 import simdna 5 from simdna.synthetic import ( 6 RepeatedEmbedder, SubstringEmbedder, ReverseComplementWrapper,
ModuleNotFoundError: No module named 'simdna'
When I try to install dragonn via conda I get an error as my DeepChem envt is on 3.5:
(deepchem-fork) joe@powerspec:~/deepchem-fork/scripts$ conda install dragonn -c kundajelab Fetching package metadata ................... Solving package specifications: .
UnsatisfiableError: The following specifications were found to be in conflict:
- dragonn -> deeplift ==0.3 -> python 2.7*
- python 3.5*
Any suggestions?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deepchem/deepchem/issues/1002#issuecomment-355110784, or mute the thread https://github.com/notifications/unsubscribe-auth/AFprOeEfeRogHd7TANXYwXqV46UWpmOLks5tG9u7gaJpZM4RRSrq .
@jisraeli A custom conda file would be well appreciated!
Eventually, I'd like to move the Dragonn support into the main library, but that will take time since we'll have to introduce dependencies carefully.
@LRParser I think I did some manual python 2.7 -> 3.5 conversions at some point. My recommendation is to just keep messing with it till things start to work. It wasn't too bad. We'll need to figure this stuff out before we can move support into deepchem proper though.
I'm happy to help out as well. I'm new to DragoNN, but have started some work with DeepChem. I've built plenty of conda packages in my day, so I could help there if appropriate.
I've also been working with DeepChem using Dockerfiles (python 2.7 and 3.5) that I built which use NVIDIA optimized tensorflow but also add in cairocffi (better looking RDKit molecule images) and requirements for pyGPGO, which don't seem to be in the latest Docker container. GitHub repo here.
Happy to work on adding DragoNN and the other additional libraries to DeepChem's provided Docker container.
Probably other ways I can help out as well as I get more familiar with both libraries.
Also, I think biopython will read FASTA files. Would there be interest in integrating that or do we feel there is a need to create our own parser?
@mlgill Great to hear you're interested in helping!
I think adding biopython integration would be great. My sense is biopython:bioinformatics as rdkit:cheminformatics. We already depend on rdkit quite a bit, so makes sense to use biopython as a complement.
My recommendation would be to get comfortable with deepchem development by submitting a small starter PR. Once you get the hang of our style, it should be relatively straightforward to figure out a design for the biopython support.
Re fasta reading - there is a function in dragonn that takes in fasta filename and returns numpy array with one hot encoded sequences: https://github.com/kundajelab/dragonn/blob/master/dragonn/utils.py#L121
MotifRNN should probably the last on the priority list - we haven't used it in practice in years.
Johnny - install tips would be much appreciated!
@LRParser python2 or python3?
@jisraeli Would love tips on other models (if any) we should implement besides SequenceDNN.
@rbharath gkmSVM
is useful for benchmarking the svm models that used to be SOTA: https://github.com/kundajelab/dragonn/blob/master/dragonn/models.py#L336. The implementation in DragoNN assumes this dependency: https://github.com/Dongwon-Lee/lsgkm.
There aren't plans to integrate with DragoNN at present so closing. Will re-open if there's interest.
Basic integration of DragoNN models with DeepChem contrib was just merged in #979. There's a good chunk of work that will have to be done to improve integration. Let's use this issue to coordinate work.
Here are some potential TODOs:
simulations.py
, the file that generates synthetic DragoNN training data was just removed from the simdna library (https://github.com/kundajelab/simdna/issues/4). We could give these simulation datasets a good home in DeepChem (somewhere indc.moleculenet
).SequenceDNN
class right now is very crude, and doesn't have the bells and whistles of the associated DragoNN class. Some performance tuning work will have to be done to match reported DragoNN numbers.dc.metrics
would be very useful.If you think you're interested in helping with the effort, please chime in on this thread.
CC @jisraeli: Would love your feedback on other ways we can improve integration.