Open jelber2 opened 2 years ago
Nice, that's a new one to me! When it comes to performance with MultiQC the answer is usually "it depends". Depends on which modules you're running, how many files and so on.
I did a little bit of playing around with profiling a while back (see docs) and there's a --profile-runtime
flag to report how much time different parts of the run are taking (docs) if you're curious.
We looked into parallelising the file search too (https://github.com/ewels/MultiQC/pull/1508) but it's not trivial and not yet merged.
However, as you say - how much it's worth working on this stuff is maybe questionable. Most of the time, MultiQC is fairly fast to run. And when it's slow it's likely at least partially I/O bound.
Ok, I might play around with it, but usually, the longest run time is no more than 10 minutes for me when searching through hundreds of files for MultiQC, so I do not think it is really big deal if there were improvements using pyston.
Note tested on a shared server, with other users, but perhaps the results are at least instructive.
# install multiqc in a python3 environment
python3 -m venv multiqc-python
source ~/bin/multiqc-python/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install multiqc
deactivate
# install multiqc in a pyston3 environment
pyston3 -m venv multiqc-pyston
source ~/bin/multiqc-pyston/bin/activate
pyston3 -m pip install --upgrade pip
pyston3 -m pip install multiqc
deactivate
# benchmark pyston multiqc install
source ~/bin/multiqc-pyston/bin/activate
time multiqc --profile-runtime --ignore *mate1 --ignore *mate2 --title pyston ../analysis.1/qc/ quants/ ../analysis.1/logs/rRNA/ ../analysis.2/mapped/
/// MultiQC 🔍 | v1.13
| multiqc | Report title: pyston
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/qc
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/quants
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/logs/rRNA
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/mapped
| searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 648/648
| bbmap | Found 19 reports
| salmon | Found 19 meta reports
| star | Found 19 reports
| fastqc | Found 76 reports
| profile_runtime | Running run time profiling module
| multiqc | Compressing plot data
| multiqc | Report : pyston_multiqc_report.html
| multiqc | Data : pyston_multiqc_report_data
| multiqc | MultiQC complete
| multiqc | Run took 72.48 seconds
| multiqc | - 61.24s: Searching files
| multiqc | - 7.77s: Running modules
| multiqc | - 0.37s: Compressing report data
| multiqc | For more information, see the 'Run Time' section in pyston_multiqc_report.html
| multiqc | 1 flat-image plot used in the report due to large sample numbers
| multiqc | To force interactive plots, use the '--interactive' flag. See the documentation.
real 1m13.756s
user 0m58.090s
sys 0m4.446s
deactivate
# benchmark python multiqc install
source ~/bin/multiqc-python/bin/activate
time multiqc --profile-runtime --ignore *mate1 --ignore *mate2 --title python ../analysis.1/qc/ quants/ ../analysis.1/logs/rRNA/ ../analysis.2/mapped/
/// MultiQC 🔍 | v1.13
| multiqc | Report title: python
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/qc
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/quants
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/logs/rRNA
| multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/mapped
| searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 648/648
| bbmap | Found 19 reports
| salmon | Found 19 meta reports
| star | Found 19 reports
| fastqc | Found 76 reports
| profile_runtime | Running run time profiling module
| multiqc | Compressing plot data
| multiqc | Report : python_multiqc_report.html
| multiqc | Data : python_multiqc_report_data
| multiqc | MultiQC complete
| multiqc | Run took 118.17 seconds
| multiqc | - 103.70s: Searching files
| multiqc | - 9.91s: Running modules
| multiqc | - 0.83s: Compressing report data
| multiqc | For more information, see the 'Run Time' section in python_multiqc_report.html
| multiqc | 1 flat-image plot used in the report due to large sample numbers
| multiqc | To force interactive plots, use the '--interactive' flag. See the documentation.
real 1m59.559s
user 1m52.737s
sys 0m4.822s
Description of feature
Hi,
I have used MultiQC, especially with bioconda installs. Curious if there might be any interest to try MultiQC with https://github.com/pyston/pyston ? I have had some issues porting python packages to pyston (UMItools was tough) but have not tried MultiQC. I guess it is a thought, but maybe there is no major improvement in parsing a lot of files as MultiQC might be I/O bound?