galaxy001 / pirs

profile basd Illumina pair-end Reads Simulator
https://code.google.com/p/pirs/
GNU General Public License v2.0
26 stars 7 forks source link

Change files names and reads names (modified) #5

Closed florealcab closed 7 years ago

florealcab commented 7 years ago

We usually use fastq files names as name of samples. So user must specify it, without any suffix. Also we add this name as a read name prefix, so read names are uniq, also between several samples.

(add missing updates from precedent pull request)

galaxy001 commented 7 years ago

Would you please use tab for indent instead of 4 spaces ?

Reading a diff changing every line is quite a burden.

florealcab commented 7 years ago

I just commit with raplacing spaces with tabes.

galaxy001 commented 7 years ago

Output dir may not exists, and pirs fails to create output file without the dir. Please add checking and make the dir.

Dumping simulation parameters in read name is useful, thus please restore them as:

--- a/src/pirs/pirs_simulate.cpp
+++ b/src/pirs/pirs_simulate.cpp
@@ -677,9 +677,10 @@ SimulationProfiles::~SimulationProfiles()
  */
 static void output_read(const Read &read, OutputStream &out_file)
 {
-       out_file.printf("%c%s_read_%"PRIu64"/%d\n",
+       out_file.printf("%c%s_read_%d_%"PRIu64"/%d\n",
                                        (read.quality_vals.empty()) ? '>' : '@',
-                                       read.indiv_name.c_str(), read.pair.pair_number, read.num_in_pair());
+                                       read.indiv_name.c_str(), read.pair.insert_len_mean,
+                                       read.pair.pair_number, read.num_in_pair());
        out_file.write(&read.seq[0], read.seq.size());
        out_file.putc('\n');
florealcab commented 7 years ago

Now output directory is checked and created is not exists (using boost library, to be cross-platform). We need to don't dump infos, so I restore like you said and I add an option to don't dump (not enabled by default)

galaxy001 commented 7 years ago

I find stat is already in POSIX_C_SOURCE>=200112L, thus we do not need the heavy boost. At least both Linux and Mac have manpage for it. See https://stackoverflow.com/a/9314702/159695 or https://www.quora.com/How-do-I-check-if-a-directory-exists-in-Linux-using-C%2B%2B/answer/Syd-Logan?srid=uBEo.

On dumping info., I think we can make indiv-name optional to act as the switch. If indiv-name is supplied, no dumping as your ways before, if not supplied, the default value for indiv-name is Sim_%read_len_%insert_len_mean.

florealcab commented 7 years ago

As you asked, I now remove boost dependency, using stat instead. I also use now indiv_name parameter as a switch. Tell me if there is still a problem.