MultiQC / MultiQC

Aggregate results from bioinformatics analyses across many samples into a single report.
http://multiqc.info
GNU General Public License v3.0
1.2k stars 595 forks source link

Might be interesting to try MultiQC with pyston #1764

Open jelber2 opened 2 years ago

jelber2 commented 2 years ago

Description of feature

Hi,

I have used MultiQC, especially with bioconda installs. Curious if there might be any interest to try MultiQC with https://github.com/pyston/pyston ? I have had some issues porting python packages to pyston (UMItools was tough) but have not tried MultiQC. I guess it is a thought, but maybe there is no major improvement in parsing a lot of files as MultiQC might be I/O bound?

ewels commented 2 years ago

Nice, that's a new one to me! When it comes to performance with MultiQC the answer is usually "it depends". Depends on which modules you're running, how many files and so on.

I did a little bit of playing around with profiling a while back (see docs) and there's a --profile-runtime flag to report how much time different parts of the run are taking (docs) if you're curious.

We looked into parallelising the file search too (https://github.com/ewels/MultiQC/pull/1508) but it's not trivial and not yet merged.

However, as you say - how much it's worth working on this stuff is maybe questionable. Most of the time, MultiQC is fairly fast to run. And when it's slow it's likely at least partially I/O bound.

jelber2 commented 2 years ago

Ok, I might play around with it, but usually, the longest run time is no more than 10 minutes for me when searching through hundreds of files for MultiQC, so I do not think it is really big deal if there were improvements using pyston.

jelber2 commented 2 years ago

Note tested on a shared server, with other users, but perhaps the results are at least instructive.

# install multiqc in a python3 environment
python3 -m venv multiqc-python
source ~/bin/multiqc-python/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install multiqc
deactivate

# install multiqc in a pyston3 environment
pyston3 -m venv multiqc-pyston
source ~/bin/multiqc-pyston/bin/activate
pyston3 -m pip install --upgrade pip
pyston3 -m pip install multiqc
deactivate
# benchmark pyston multiqc install
source ~/bin/multiqc-pyston/bin/activate
time multiqc --profile-runtime --ignore *mate1 --ignore *mate2 --title pyston ../analysis.1/qc/ quants/ ../analysis.1/logs/rRNA/ ../analysis.2/mapped/

  /// MultiQC 🔍 | v1.13

|           multiqc | Report title: pyston
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/qc
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/quants
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/logs/rRNA
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/mapped
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 648/648  
|             bbmap | Found 19 reports
|            salmon | Found 19 meta reports
|              star | Found 19 reports
|            fastqc | Found 76 reports
|   profile_runtime | Running run time profiling module
|           multiqc | Compressing plot data
|           multiqc | Report      : pyston_multiqc_report.html
|           multiqc | Data        : pyston_multiqc_report_data
|           multiqc | MultiQC complete
|           multiqc | Run took 72.48 seconds
|           multiqc |  - 61.24s: Searching files
|           multiqc |  - 7.77s: Running modules
|           multiqc |  - 0.37s: Compressing report data
|           multiqc | For more information, see the 'Run Time' section in pyston_multiqc_report.html
|           multiqc | 1 flat-image plot used in the report due to large sample numbers
|           multiqc | To force interactive plots, use the '--interactive' flag. See the documentation.

real    1m13.756s
user    0m58.090s
sys 0m4.446s
deactivate

# benchmark python multiqc install
source ~/bin/multiqc-python/bin/activate
time multiqc --profile-runtime --ignore *mate1 --ignore *mate2 --title python ../analysis.1/qc/ quants/ ../analysis.1/logs/rRNA/ ../analysis.2/mapped/

  /// MultiQC 🔍 | v1.13

|           multiqc | Report title: python
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/qc
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/quants
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.1/logs/rRNA
|           multiqc | Search path : /nfs/scistore16/itgrp/bioinf/projects/DA0037/2022_June_15/analysis.2/mapped
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 648/648  
|             bbmap | Found 19 reports
|            salmon | Found 19 meta reports
|              star | Found 19 reports
|            fastqc | Found 76 reports
|   profile_runtime | Running run time profiling module
|           multiqc | Compressing plot data
|           multiqc | Report      : python_multiqc_report.html
|           multiqc | Data        : python_multiqc_report_data
|           multiqc | MultiQC complete
|           multiqc | Run took 118.17 seconds
|           multiqc |  - 103.70s: Searching files
|           multiqc |  - 9.91s: Running modules
|           multiqc |  - 0.83s: Compressing report data
|           multiqc | For more information, see the 'Run Time' section in python_multiqc_report.html
|           multiqc | 1 flat-image plot used in the report due to large sample numbers
|           multiqc | To force interactive plots, use the '--interactive' flag. See the documentation.

real    1m59.559s
user    1m52.737s
sys 0m4.822s