dkoslicki / MetaPalette

Metagenomic profiling and phylogenetic distances via common kmers
Other
42 stars 5 forks source link

not producing profile result #6

Open kemin711 opened 8 years ago

kemin711 commented 8 years ago

I have run the Classify.py program, but it does not produce the profile file which is needed for the next step.

I have added the logging utility and trying to figure out why

dkoslicki commented 8 years ago

Do you mind providing a bit more information? For example:

  1. Were any error/warning messages output?
  2. Do any temporary files get created (eg. /output_folder/input_file.fasta-y30.txt)?
  3. Have you tried it on a small subset of your data? If it still does not work on a small subset, you can email me the data and I can try to diagnose the problem on my end.
kemin711 commented 8 years ago

I can send you the updated version of the Classify.py in which I added logging: The program died after line 170, and never got a chance to 174 159 for kmer_size in kmer_sizes: 160 log.debug("working on kmer %s" % kmer_size) 161 pool = Pool(processes = num_threads) 162 Y = np.array(pool.map(form_y_star, izip(training_file_names, repeat(kmer_size))), dtype=np.float64) 163 pool.close() 164 Y_norm = Y/total_kmers 165 Y_norms.append(Y_norm) 166 fid = open(os.path.join(output_folder,file_base_name+"-y"+str(kmer_size)+".txt"),'w') 167 for i in range(len(Y_norm)): 168 fid.write(str(Y_norm[i])+"\n") 169 fid.close() 170 log.debug("%s Y_norms generated" % Y_norms) 171 172 # start of the main script 173 #Load the common kmer matrices 174 log.debug("Load the common kmer matrices") 175 CKM_matrices = list() 176 for kmer_size in kmer_sizes: 177 fid = h5py.File(os.path.join(data_dir,"CommonKmerMatrix-"+str(kmer_size)+"mers.h5"),'r') 178 CKM_matrices.append(np.array(fid["common_kmers"][:,:], dtype = np.float64)) 179 Skipping kmer counting step 2016-06-10 12:30:25 [DEBUG - Classify:169] [array([ 1.63359412e-04, 1.74364678e-04, 1.30687530e-04, ..., 9.73278183e-05, 1.98782611e-04, 1.22433581e-04]), array([ 4.78041228e-05, 2.92327370e-05, 2.26983605e-05, ..., 3.16401388e-05, 5.08993538e-05, 2.44179332e-05])] Y_norms generated /da/bin/runclassify.sh: line 31: 25109 Killed $cmdstr Failed to run python /da/bin/Classify.py -d /home/users/zhouk15/hopmetag/microbiom/metapalette/Bacteria -o classifyout -i A01_ccs3_3.fastq -Q C -k sensitive -j /usr/local/bin/jellyfish -q /usr/local/bin/query_per_sequence -t 24 -n Fri Jun 10 12:33:20 PDT 2016

dkoslicki commented 8 years ago

Curious! That's rather odd to have the program die after 170 and before 174 considering there aren't any commands between those lines. Are you on a cluster or other managed computational resource? Considering that the command was killed, it's possible you might be hitting some sort of resource limitation and the policies are automatically killing the job.

kemin711 commented 8 years ago

David, I just sent a test dataset fastq file.  I updated your source code with Makefile and a shell script driver that can be made into a more generic interface for your python code.  I added the logging facility for easy debugging. It might have crashed when the python was trying to deallocating memory.  Kemin Zhou 858 771-3269

  From: David Koslicki <notifications@github.com>

To: dkoslicki/MetaPalette MetaPalette@noreply.github.com Cc: Kemin Zhou kmzhou4@yahoo.com; Author author@noreply.github.com Sent: Friday, June 10, 2016 1:06 PM Subject: Re: [dkoslicki/MetaPalette] not producing profile result (#6)

Curious! That's rather odd to have the program die after 170 and before 174 considering there aren't any commands between those lines. Are you on a cluster or other managed computational resource? Considering that the command was killed, it's possible you might be hitting some sort of resource limitation and the policies are automatically killing the job.— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.