Closed jregalad closed 8 years ago
Hi Julian,
I understand the implementation isn’t the best (time/skill/priority constraints), but that is one reason why the Docker container has the --shell
command. If you’d like to play with the code pre-installed, use docker run -i -t dkoslicki/metapalette --shell
.
There are a number of reasons why I chose not to use the current version of Bcalm (BCALM 2):
As for the query_per_sequence
code, if you take a look at MetaPalette/src/QueryPerSeq/query_per_sequence.cc
, you will see that the output is indeed an integer (not a string as in https://github.com/gmarcais/Jellyfish/tree/master/examples/query_per_sequence). I’ve modified the query_per_sequence.cc
code to return just what I need (the counts). Hence why the readme
says to compile this code (and doesn’t point to the Jellyfish repository).
As for the bugs you found, I would appreciate a pull request if you would like to contribute your fixes. Otherwise feel free to let me know what the other main issues with Train.py
were and I can see if I can resolve them.
Thanks.
Very good idea but implementation is not the best. Took me quite a while to get Training.py to run. Added some error handling to the subprocess.checkcall() commands. Also, code does not work with new versions of bcalm.
Main bug is with Classify.py:
On line 103 there is a `def form_y(file_name, kmer_size):
cmd = query_per_sequence_binary +" "+ os.path.join(output_folder,file_base_name+"-"+str(kmer_size)+"mers.jf") + " " + os.path.join(data_dir,"Bcalms",file_name+"-30mers.bcalm.fa")
res = subprocess.check_output(cmd, shell = True)
return int(res)`
Why are you returning int(res) when res stores the output of query_per_sequence which is a string (fasta)? https://github.com/gmarcais/Jellyfish/tree/master/examples/query_per_sequence
After this code fails because the array Y can not be created.
Hope we can sort this out, Julian