DavidsonGroup / flexiplex

The Flexible Demultiplexer
https://davidsongroup.github.io/flexiplex/
MIT License
26 stars 2 forks source link

adding verbose flag #31

Closed ghar1821 closed 8 months ago

ghar1821 commented 12 months ago

Hi guys,

Is there a verbose option which print out some form of a progress? the progress can be as simple as printing out how many reads in the input it has processed thus far.

Also, I noticed the new version does not write the output out as it process the reads. Rather, it maps everything first, then write the result. Is it an idea to start writing the result as the read is being mapped?

I know these are all complex as there are currently multi threaded option, but maybe we can get each thread to write out its own result, then merge everything together at the end?

nadiadavidson commented 12 months ago

Hi, It should be giving printing out updates and writing to the reads file. Only a small amount of buffering happens, which is usually not noticeable. Could it be an I/O thing on the HPC? Or perhaps it just running so slowly in your case that the updates are too far apart? Feel free to message us the command you run so we can reproduce it.

Cheers, Nadia.

olliecheng commented 12 months ago

the progress can be as simple as printing out how many reads in the input it has processed thus far.

Flexiplex currently prints the number of processed reads to standard error. It looks something like this:

FLEXIPLEX 1.00.1
Setting known barcodes from filtered_barcodes.txt
Number of known barcodes: 182
Setting number of threads to 16
Using default search pattern: 
primer: CTACACGACGCTCTTCCGATCT
BC: ????????????????
UMI: ????????????
polyA: TTTTTTTTT
For usage information type: flexiplex -h
Searching for barcodes...
0.1 million reads processed..
0.2 million reads processed..
0.3 million reads processed..
0.4 million reads processed..
0.5 million reads processed..
0.6 million reads processed..
0.7 million reads processed..
[etc]

Is this perhaps what you're looking for?

ghar1821 commented 12 months ago

the progress can be as simple as printing out how many reads in the input it has processed thus far.

Flexiplex currently prints the number of processed reads to standard error. It looks something like this:

FLEXIPLEX 1.00.1
Setting known barcodes from filtered_barcodes.txt
Number of known barcodes: 182
Setting number of threads to 16
Using default search pattern: 
primer: CTACACGACGCTCTTCCGATCT
BC: ????????????????
UMI: ????????????
polyA: TTTTTTTTT
For usage information type: flexiplex -h
Searching for barcodes...
0.1 million reads processed..
0.2 million reads processed..
0.3 million reads processed..
0.4 million reads processed..
0.5 million reads processed..
0.6 million reads processed..
0.7 million reads processed..
[etc]

Is this perhaps what you're looking for?

Yep! I just realised that if you have less than 100,000 reads in the file, you won't get the progress bar as the very first progress update you get is after you have processed 100,000 reads. That's why I haven't been getting any progress update.

Is there a way of specifying the verbosity level or converting that to like a percentage? Like '10% of the reads processed'. That way if you have less than 100,000 reads, we can still get progress.

olliecheng commented 12 months ago

Would be an interesting thing to implement, thanks for the idea. We could get file sizes from the system, and then compare that with the # of bytes read to get a percentage. Would be a nice little quality-of-life improvement! Will definitely add to the to-do log :)

olliecheng commented 12 months ago

(For future me: The only concern is that this would need to be robust and recognise when input is piped instead of from a file. The easiest method would also require C++17 or will sacrifice portability.)

olliecheng commented 8 months ago

@ghar1821 Try the latest commit (referenced above), which addresses this issue by displaying messages every 10,000 reads, for the first 100,000 reads. Is that enough precision?

ghar1821 commented 8 months ago

Thanks @olliecheng. Will have a go when I get the chance!