Open eileenwho opened 6 years ago
also add in more defensive programming
if possible, make sure that blast won't copy in sequences that are already in the file etc see notes on blast30ribo for additional things
1 speed up blast 2 make sure writing to output file doesn't take too much time 3 add option to check for redundant sequence names and delete thoughts on extractCopyProteinSeq priorities are: do blast and copy over in 1 step so there are just less files around with an output file that saves every line written to a file, name of that file, name of original file ok currently output file has everything could maybe get rid of a few lines in blast output thru https://www.ncbi.nlm.nih.gov/books/NBK279682/
try to speed up blast (multiple threads?) http://seqanswers.com/forums/showthread.php?t=26085 https://wiki.hpcc.msu.edu/display/Bioinfo/BLAST+with+Multiple+Processors http://voorloopnul.com/blog/how-to-correctly-speed-up-blast-using-num_threads/
add in notes about what to change for blastn blastx blastp or diff file format or only doing blast/ only doing copying over
also there are many notes on blast30ribo_extractCopyProteinSeq_withnotes.py
There's no reason to do blast and copy over in different steps
Send this separate files to people in case they want to do it separately
But can just copy directly into analysis-l1 file With output file so you have records of what the results were and can check in case of errors And terminal output because it takes a while Still can use temp blast But if there's something, copy in, if not don't Also add option for doing blastp and Blastn Look more at Syntax of tempblast file to be certain of things
add some output to run in terminal?
for now write "to only do one thing, comment out the relevant function" but later make it so that if you don't have -of or if you don't have -db you can just do only extract or only copy, want to make it easier for the user
notes written into the code earlier
oh just realized that integrating the blast and the copying in sequences opens this up to more problems if it stops running in the middle b/c some things will be half blasted and copied hmmm make sure that you don't copy in sthg that's already there would be good but would also probably be slower oh wait you could check the output file and see where it stopped
go through and clarify what each script does make code more general make sure comments are clear Combine my previous file dealing codes into a multipurpose thing Options Type of file extension Edit or search Search for what Name of new file Edit replace what with what What do deflines look like or even copy in a block and have that be analyzed? And default