Open anani-a-missinou opened 4 years ago
Hey 2AMissinou,
Grine got a complete overhaul with many new features and much better performance (it also has a much better look now). I currently prepare a manuscript for the new version. However, the repo is already public and can be used.
You find the latest working version on: https://github.com/Kawue/grine-v2/tree/dev For data generation you need to use: https://github.com/Kawue/msi-community-detection
In general your data should work without problems. However, you need to have HDF5 data to use the msi community detection software. The pipeline referenced in the repository is currently not available. But if your data is already processed and picked you can use a parser: https://github.com/Kawue/imzML-to-HDF5
Since GRINE maps your image data to a graph, your data needs to be picked. Otherwise the number of nodes and edges will result in performance issues.
For further and more specific questions you are welcome to write me per mail.
Dear Karsten Wüllems,
First of all, thank you for your support.
I tried to convert mi .ibd and .imzML files but, I couldn't get the imzML-to-HDF5 tool to work. After installing Anaconda on Windows, I put .imzML and .ibd file on GRINE\imzML-to-HDF5-master directory and apply your script.
*** Error details ****
PS D:\Parts\Metabolomics\DATA\trated_data\GRINE\imzML-to-HDF5-master> PS D:\Parts\Metabolomics\DATA\trated_data\GRINE\imzML-to-HDF5-master> create -f environment.yml create : Le terme «create» n'est pas reconnu comme nom d'applet de commande, fonction, fichier de script ou programme exécutable. Vérifiez l'orthographe du nom, ou si un chemin d'accès existe, vérifiez que le chemin d'accès est correct et réessayez. Au caractère Ligne:1 : 1
+ CategoryInfo : ObjectNotFound: (create:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
I also tested this on our cluster, but it doesn't work and give a same error. test Please, help me to successfully running your imzML-to-HDF5 in order to used msi-community-detection.
Thank you. Anani
Hey Anani,
sorry this is my fault, I did a mistake while writing the manual. The call should be conda env create -f .\environment.yml
. And remember that the activate msiparse
command does not work in Powershell, you have to use the cmd.
If you get further errors, just forward them to me.
Dear Karsten Wüllems,
With the new code, the installation has flowed well like clockwork. 👍
I used the Windows cmd ligne. But, program execution does not work. I added a capture of my different attempts. Could you please help me again.
THANK YOU FOR YOUR SUPPORT
Well, your first call (providing the .imlzml to -i) was the correct one. However, it seems like your data is not well aligned as for different spectra the parser reads different mz values. I tried to push a fix, but I do not have such data at hand for testing purposes. Try the new code by downloading the repo again (or pulling if you use git). The current solution is not the best but it should work, as I just fill missing values with zero. Providing well aligned data would be the better choice. However, soon our new pipeline is released, making this repo unnecessary.
Dear Karsten Wüllems,
With the first code and the new version obtained after git, It breaks after long-running with permanent WARNING message and none file in the output directory. Do do you another path to align my data before running msiproc.py?
THANK YOU SO MUCH FOR YOUR HELP
Oh well, I placed the print statement in a very awkward place. Is your data confidential or would it be possible to share one imzML file with me?
Could you give me your email, I will send you one example .imzML and .ibd file. I hope this will help us make a good conversion.
kwuellems@techfak.uni-bielefeld, Is that your current email? please confirm me.
THANKS
This one is fine. You can also use wuellems@cebitec.uni-bielefeld.de . They all route to the same. I already pushed another hotfix, but this way I can test it myself, making fixing things much easier.
It's sent. Please confirm receipt.
Got it. I hope to find a solution today, latest should be monday.
Well, looks like I was wrong. You data looks aligned but seems to be "condensed", meaning that it does not contain zero values and mostly only peaks. It looks like each spectrum is individually picked. Therefore each spectrum has a different length. I converted it into an HDF5 and printed the mean spectrum to check if it looks ok. Looks fine to me, as far as I, as a bioinformation, can judge this. So if you use the latest version of the script the conversion should work.
However, I found that your data contains duplicates, I spotted at least one. The second spectrum entry of your imzML data contains one m/z value two times, with the exact same intensity value. To be precise, it is m/z 171.30209351 with the intensity 74594.36. If you did no preprocessing this seems like some kind of intrument error or vendor software bug. However, the script is now capable to handle this problem.
Hi dear Karsten Wüllems,
Thanks for the support and excuse, I was in Ph.D. Traineeship since the beginning of the week.
I am surprised by the presence of these duplicate m / z in my data. I agree, maybe it comes from a technical error. I run the new script. I have a memory error with the same files I sent you.
I have Intel (R) Core I7-850H CPU, 2.20GHz 2.21 GHz 16GB RAM, 64 bits, x64 processor. Is this memory error justified? On which computing size did you run it?
I'm trying to use it on our Computing cluster. I'll come back to you as soon as I get something new.
The code is quite memory inefficient, I ran it on 32GB.
Hi dear Karsten Wüllems
To improve the computing capacity, I try to install your tool on our computer cluster, using the following code
conda create -n imzML-to-HDF5 conda activate imzML-to-HDF5 cd /home/genouest/inra_umr1349/amissino/miniconda3/envs/imzML-to-HDF5 wget https://github.com/Kawue/imzML-to-HDF5/archive/master.zip unzip master.zip cd imzML-to-HDF5-master/ mv * .. conda env create -f \environment.yml
** ERROR ***** (imzML-to-HDF5) [amissino@genossh:imzML-to-HDF5] $ conda env create -f \environment.yml Collecting package metadata (repodata.json): done Solving environment: failed
ResolvePackageNotFound:
Despite a manual in installation, it doesn't detect installation. Can you help me install this on our computer cluster? Thank.
I assume you have a linux cluster. This is a windows specific problem. The encoding behind the equal sign are some windows specific identifiers which cannot be interpreted by linux. I uploaded a new environment without them.
Also you should not need the conda create -n imzML-to-HDF5
and conda activate imzML-to-HDF5
its enough to download and unpack the package and call conda env create -f \environment.yml
.
Sorry for all these errors. I only tested the tools on my local system. Finding them now helps a lot.
Found a second potential problem and updated the yaml again.
Hi dear Karsten,
I successfully install it with a new script but with some complications for the installation of openCV package which was not installed in the conda environment created for this purpose. I moved file cv2 and openCV in miniconda3/envs/msiparse2/lib/python3.8/site-packages/.
With the same files I sent you, it runs but killed running process with this error message
*** Error message ** (base) [amissino@cl1n016:imzML-to-HDF5] $ conda activate msiparse2 (msiparse2) [amissino@cl1n016:imzML-to-HDF5] $ python msiproc.py -i input/dni-14jai2_pos100-700_181009-root_mean_square.imzML -o output/
No peak list specified! Conversion will be based on the full detection range.
Loading input/dni-14jai2_pos100-700_181009-root_mean_square.imzML WARNING: Not all spectra have the same mz values. Missing values are filled with zeros! Killed
Thank you for your help.
Seems like miniconda had problems with openCV. Also there was a dependency missing which was needed for pandas. Again, I updated the environment.yml to fix both problems.
The warnings you got are fine. I printed them myself. To make the user aware of whats going on. There is no details about why it got killed. I would assume a problem with pytables, wich is now included in the new environment.yml.
Try to install the env again and run python with the -u flag, i.e. python -u msiproc.py [.....]. You can also run /usr/bin/time -v python -u for some details about the time that was needed.
Hi dear Kawue,
I redid the install miniconda. He still has a problem with CV2 as as yesterday.
** Error 1 ** (base) [amissino@genossh:BGC] git clone https://github.com/Kawue/imzML-to-HDF5 (base) [amissino@genossh:BGC] cd imzML-to-HDF5 (base) [amissino@genossh:imzML-to-HDF5-master] $ conda env create -f environment.yml Collecting package metadata (repodata.json): done Solving environment: / Warning: 2 possible package resolutions (only showing differing packages):
==> WARNING: A newer version of conda exists. <== current version: 4.7.12 latest version: 4.8.2
Please update conda by running
$ conda update -n base -c defaults conda
Downloading and Extracting Packages libgcc-ng-9.2.0 | 8.2 MB | ################################################################################## nss-3.47 | 1.9 MB | ################################################################################## liblapacke-3.8.0 | 10 KB | ################################################################################## jasper-1.900.1 | 286 KB | ################################################################################## gnutls-3.6.5 | 2.1 MB | ################################################################################## zstd-1.4.4 | 989 KB | ################################################################################## libxkbcommon-0.10.0 | 475 KB | ################################################################################## opencv-4.2.0 | 19 KB | ################################################################################## graphite2-1.3.13 | 109 KB | ################################################################################## libstdcxx-ng-9.2.0 | 4.5 MB | ################################################################################## zlib-1.2.11 | 105 KB | ################################################################################## openssl-1.1.1d | 2.1 MB | ################################################################################## xorg-renderproto-0.1 | 8 KB | ################################################################################## xorg-libxrender-0.9. | 31 KB | ################################################################################## ca-certificates-2019 | 145 KB | ################################################################################## intel-openmp-2019.4 | 729 KB | ################################################################################## _libgcc_mutex-0.1 | 3 KB | ################################################################################## blosc-1.17.1 | 886 KB | ################################################################################## pip-19.3.1 | 1.9 MB | ################################################################################## tk-8.6.10 | 3.2 MB | ################################################################################## pytz-2019.3 | 237 KB | ################################################################################## ffmpeg-4.1.3 | 75.8 MB | ################################################################################## libuuid-2.32.1 | 26 KB | ################################################################################## gstreamer-1.14.5 | 4.5 MB | ################################################################################## xorg-libxdmcp-1.1.3 | 18 KB | ################################################################################## pthread-stubs-0.4 | 5 KB | ################################################################################## readline-8.0 | 441 KB | ################################################################################## xorg-xextproto-7.3.0 | 27 KB | ################################################################################## certifi-2019.11.28 | 148 KB | ################################################################################## qt-5.12.5 | 99.2 MB | ################################################################################### | 100% libclang-9.0.1 | 22.3 MB | ########################################################################## | 100% libwebp-1.0.2 | 938 KB | ########################################################################## | 100% cairo-1.16.0 | 1.5 MB | ########################################################################## | 100% libiconv-1.15 | 2.0 MB | ########################################################################## | 100% libffi-3.2.1 | 46 KB | ########################################################################## | 100% xorg-libsm-1.2.3 | 25 KB | ########################################################################## | 100% libopenblas-0.3.7 | 7.6 MB | ########################################################################## | 100% setuptools-42.0.1 | 652 KB | ########################################################################## | 100% bzip2-1.0.8 | 396 KB | ########################################################################## | 100% xorg-libx11-1.6.9 | 918 KB | ########################################################################## | 100% libxcb-1.13 | 396 KB | ########################################################################## | 100% numexpr-2.7.1 | 197 KB | ########################################################################## | 100% ld_impl_linux-64-2.3 | 589 KB | ########################################################################## | 100% mock-3.0.5 | 44 KB | ########################################################################## | 100% mkl-2019.4 | 131.2 MB | ########################################################################## | 100% giflib-5.2.1 | 73 KB | ########################################################################## | 100% hdf5-1.10.5 | 3.1 MB | ########################################################################## | 100% expat-2.2.9 | 191 KB | ########################################################################## | 100% openh264-1.8.0 | 1.4 MB | ########################################################################## | 100% xz-5.2.4 | 366 KB | ########################################################################## | 100% libcblas-3.8.0 | 10 KB | ########################################################################## | 100% gmp-6.2.0 | 811 KB | ########################################################################## | 100% numpy-1.17.3 | 5.2 MB | ########################################################################## | 100% xorg-libice-1.0.10 | 57 KB | ########################################################################## | 100% nspr-4.25 | 1.6 MB | ########################################################################## | 100% libtiff-4.1.0 | 568 KB | ########################################################################## | 100% fontconfig-2.13.1 | 340 KB | ########################################################################## | 100% lame-3.100 | 498 KB | ########################################################################## | 100% nettle-3.4.1 | 5.7 MB | ########################################################################## | 100% liblapack-3.8.0 | 10 KB | ########################################################################## | 100% lz4-c-1.8.3 | 187 KB | ########################################################################## | 100% wheel-0.33.6 | 35 KB | ########################################################################## | 100% xorg-libxext-1.3.4 | 51 KB | ########################################################################## | 100% wheezy.template-0.1. | 16 KB | ########################################################################## | 100% libopencv-4.2.0 | 55.4 MB | ########################################################################## | 100% python-dateutil-2.8. | 220 KB | ########################################################################## | 100% xorg-libxau-1.0.9 | 13 KB | ########################################################################## | 100% gettext-0.19.8.1 | 3.6 MB | ########################################################################## | 100% pytables-3.6.1 | 1.5 MB | ########################################################################## | 100% libgfortran-ng-7.3.0 | 1.7 MB | ########################################################################## | 100% _openmp_mutex-4.5 | 5 KB | ########################################################################## | 100% dbus-1.13.6 | 602 KB | ########################################################################## | 100% gst-plugins-base-1.1 | 6.8 MB | ########################################################################## | 100% six-1.13.0 | 22 KB | ########################################################################## | 100% xorg-kbproto-1.0.7 | 26 KB | ########################################################################## | 100% harfbuzz-2.4.0 | 1.5 MB | ########################################################################## | 100% xorg-xproto-7.0.31 | 72 KB | ########################################################################## | 100% pcre-8.44 | 261 KB | ########################################################################## | 100% libxml2-2.9.10 | 1.3 MB | ########################################################################## | 100% libblas-3.8.0 | 10 KB | ########################################################################## | 100% py-opencv-4.2.0 | 21 KB | ########################################################################## | 100% lzo-2.10 | 319 KB | ########################################################################## | 100% ncurses-6.1 | 1.3 MB | ########################################################################## | 100% pixman-0.38.0 | 594 KB | ########################################################################## | 100% jpeg-9c | 251 KB | ########################################################################## | 100% pyimzml-1.2.6 | 20 KB | ########################################################################## | 100% python-3.8.0 | 38.4 MB | ########################################################################## | 100% x264-1!152.20180806 | 1.4 MB | ########################################################################## | 100% icu-64.2 | 12.6 MB | ########################################################################## | 100% freetype-2.10.0 | 884 KB | ########################################################################## | 100% sqlite-3.30.1 | 2.0 MB | ########################################################################## | 100% libpng-1.6.37 | 343 KB | ########################################################################## | 100% pandas-0.25.3 | 11.8 MB | ########################################################################## | 100% libllvm9-9.0.1 | 25.1 MB | ########################################################################## | 100% llvm-openmp-9.0.1 | 782 KB | ########################################################################## | 100% glib-2.58.3 | 3.3 MB | ########################################################################## | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done #
#
#
#
(base) [amissino@genossh:imzML-to-HDF5-master] $ conda activate msiparse3
(msiparse3) [amissino@genossh:BGC] $ cd ..
(msiparse3) [amissino@genossh:BGC] $ mv input/ imzML-to-HDF5-master/
(msiparse3) [amissino@genossh:BGC] $ cd imzML-to-HDF5-master/
(msiparse3) [amissino@genossh:imzML-to-HDF5-master] $ mkdir output
(msiparse3) [amissino@genossh:imzML-to-HDF5-master] $ python -u msiproc.py -i input/dni-14jai2_pos100-700_181009-root_mean_square.imzML -o output/
Traceback (most recent call last):
File "msiproc.py", line 7, in
OPENCV HAS NOT BEEN INSTALLED
(msiparse3) [amissino@genossh:~] $ cd /home/genouest/inra_umr1349/amissino/miniconda3/envs/msiparse3/lib/python3.8/site-packages (msiparse3) [amissino@genossh:site-packages] $ ls certifi pandas-0.25.3.dist-info setuptools-42.0.1.post20191125-py3.8.egg-info certifi-2019.11.28-py3.8.egg-info pip six-1.13.0.dist-info cv2.cpython-38-x86_64-linux-gnu.so pip-19.3.1-py3.8.egg-info six.py dateutil pkg_resources tables easy_install.py pycache tables-3.6.1.dist-info mock pyimzml wheel mock-3.0.5.dist-info pyimzML-1.2.6.dist-info wheel-0.33.6-py3.8.egg-info numexpr python_dateutil-2.8.1.dist-info wheezy numexpr-2.7.1.dist-info pytz wheezy.template-0.1.167-py2.7.egg-info numpy pytz-2019.3.dist-info wheezy.template-0.1.167-py2.7-nspkg.pth numpy-1.17.3.dist-info README.txt pandas setuptools
(msiparse3) [amissino@genossh:imzML-to-HDF5-master] $ python -u msiproc.py -i input/dni-14jai2_pos100-700_181009-root_mean_square.imzML -o output/
Traceback (most recent call last):
File "msiproc.py", line 7, in
I also notice also pyTable is distribute in anaconda, not in channels mentioned in environment.yml (conda-forge, bioconda or default). https://anaconda.org/anaconda/pytables
THANK YOU
ok, I reverted a change and would like to ask you to install miniconda again and to install the new environment again. I downloaded miniconda on ubuntu myself and it works on my system now without any issues. If you still get errors post them and I will try to solve it.
It is well installed, but always with the same WARNING and without generating the converted file as output.
(msiparse3) [amissino@genossh:imzML-to-HDF5] $ python -u msiproc.py -i input/dni-14jai2_pos100-700_181009-root_mean_square.imzML -o output/
No peak list specified! Conversion will be based on the full detection range.
Loading input/dni-14jai2_pos100-700_181009-root_mean_square.imzML WARNING: Not all spectra have the same mz values. Missing values are filled with zeros! Killed
Thank you
The warning can be ignored. I implemented it into the code, but it will not interrupt anything. But I do not know why it is killed.
On our linux compute cluster it works without problems. Seems that your cluster has some issues.
Can you ask some administrator how to get more details about the kill reason? There should be an error output somewhere.
I tried to speed things up a bit and integrated a few more print statements to identify where problems can occur.
Hi Karsten Wüllems,
I remove the old version and I have installed the latest version. Please see the following error message.
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ python -u msiproc.py -i input/dni-14jai2_pos100-700_181009-root_mean_square.imzML -o output/
No peak list specified! Conversion will be based on the full detection range.
Loading input/dni-14jai2_pos100-700_181009-root_mean_square.imzML
Loading done!
m/z consistency check ... WARNING: Not all spectra have the same mz values. Missing values are filled with zeros!
m/z consistency check done!
DataFrame creation ... Killed
THANK YOU
ok, seems like your cluster stops at the dataframe creation. Most likely because of memory resource problems. Can you provide a parameter to your cluster to reserve a specific amount of memory? Around 32GB should be enough.
Ok, it has run successfully with 100GB of RAM memory, not with 32GB and 50GB. :) 😊👍✌👌
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ sbatch --mem=32GB run_msiparse.sh Submitted batch job 665426 (msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ more slurm-665426.out
No peak list specified! Conversion will be based on the full detection range.
Loading input/dni-14jai2_pos100-700_181009-root_mean_square.imzML
Loading done!
m/z consistency check ... WARNING: Not all spectra have the same mz values. Missing values are filled with zeros!
m/z consistency check done!
DataFrame creation ... /var/spool/slurmd/job665426/slurm_script: line 2: 252487 Killed python -u msiproc.py -i input/dni-14jai2_pos100-700_181009-root_mean_square. imzML -o output/ slurmstepd: error: Detected 1 oom-kill event(s) in step 665426.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ sbatch --mem=50GB run_msiparse.sh Submitted batch job 665473 (msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ more slurm-665473.out
No peak list specified! Conversion will be based on the full detection range.
Loading input/dni-14jai2_pos100-700_181009-root_mean_square.imzML
Loading done!
m/z consistency check ... WARNING: Not all spectra have the same mz values. Missing values are filled with zeros!
m/z consistency check done!
DataFrame creation ... /var/spool/slurmd/job665473/slurm_script: line 2: 74884 Killed python -u msiproc.py -i input/dni-14jai2_pos100-700_181009-root_mean_square.i mzML -o output/ slurmstepd: error: Detected 1 oom-kill event(s) in step 665473.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ sbatch --mem=100GB run_msiparse.sh Submitted batch job 665501
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ more slurm-665501.out
No peak list specified! Conversion will be based on the full detection range.
Loading input/dni-14jai2_pos100-700_181009-root_mean_square.imzML
Loading done!
m/z consistency check ... WARNING: Not all spectra have the same mz values. Missing values are filled with zeros!
m/z consistency check done!
DataFrame creation ...
DataFrame creation done
Write DataFrame ...
/home/genouest/inra_umr1349/amissino/miniconda3/envs/msiparse_A2M/lib/python3.8/site-packages/tables/path.py:155: NaturalNameWarning: object name is not a va
lid Python identifier: 'msi_frame_dni-14jai2_pos100-700_181009-root_meansquare'; it does not match the pattern ``^[a-zA-Z][a-zA-Z0-9_]*$; you will not be able to use natural naming to access this object; using
getattr()`` will still work, though
check_attribute_name(name)
done. Script completed!
Do you think the conversion went well according to the size of the converted/generated file or according to other control criteria if it exists?
`*** (msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ cd input/ (msiparse_A2M) [amissino@genossh:input] $ ls dni-14jai2_pos100-700_181009-root_mean_square.ibd dni-14jai2_pos100-700_181009-root_mean_square.imzML (msiparse_A2M) [amissino@genossh:input] $ ls -l total 286224 -rw-r--r-- 1 amissino inra_umr1349 200432944 Sep 17 18:04 dni-14jai2_pos100-700_181009-root_mean_square.ibd -rw-r--r-- 1 amissino inra_umr1349 38731863 Sep 17 18:04 dni-14jai2_pos100-700_181009-root_mean_square.imzML (msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ cd .. (msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ cd output/ (msiparse_A2M) [amissino@genossh:output] $ ls -l -rw-r--r-- 1 amissino inra_umr1349 400548573 Mar 6 12:03 dni-14jai2_pos100-700_181009-root_mean_square.h5
If yes, now that I have the correct file format, can I use MSI Community Detection (GRINE).
THANKS FOR ALL YOUR HELP 👏
Wow thats super weird. How does it need 100GB when I can run it on my 32GB system.
How large is your .h5 file now?
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ cd .. (msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ cd output/ (msiparse_A2M) [amissino@genossh:output] $ ls -l -rw-r--r-- 1 amissino inra_umr1349 400548573 Mar 6 12:03 dni-14jai2_pos100-700_181009-root_mean_square.h5
File size is nearly 400 MB.
ok, that should be fine.
Your data has a weird measurement artifact. there is a vertial slice on the left quarter without any measurement. However, that should not be a problem for most algorithms.
The problem is that your data is unpicked and has 46.510 mass channels. On a high spec pc grine should be able to handle around 400-500. However, I will invite you to one of our software packages, which is currently private but will be published very soon. To use it you have to re-run the imzml-to-hdf5.py as I have to add a detail to make it compatible.
Afterwards you have to make a second environment via the environment.yml provided with the other package. After activating it you can call:
python -r path-to-h5 -s savepath --interactive
This command will open a plot that allows you to set a threshold for pick peaking via the right mouse button. If you like you can also apply a simple deisotoping procedure via the respective button. this will generate a smaller h5 that is suitable for Grine.
Also, if you want to inspect your selected channels you can call:
python msi_image_writer.py -r path-to-picked-h5 -s savepath --write_mz
.
This will make pngs of all your channels.
Everything will be ready as soon as you get the invite. I hope there will be no error with the other package. However, if so, we will resolve them.
Hi dear Karsten,
I reinstalled msiparse env, convert imzML file to h5 (big size). I tried to create provim env, but it doesn't work.
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ sbatch --mem=100GB run_msiparse.sh (msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ more slurm-719016.out
No peak list specified!
Conversion will be based on the full detection range.
Loading input/dni-14jai2_pos100-700_181009-root_mean_square.imzML
Loading done!
m/z consistency check ...
WARNING: Not all spectra have the same mz values. Missing values are filled with zeros!
m/z consistency check done!
DataFrame creation ...
DataFrame creation done
DataFrame size equals: 22421 pixels, 46509 mz-values
Write DataFrame ...
/home/genouest/inra_umr1349/amissino/miniconda3/envs/msiparse_A2M/lib/python3.8/site-packages/tables/path.py:155: Nat
uralNameWarning: object name is not a valid Python identifier: 'msi_frame_dni-14jai2_pos100-700_181009-root_meansqua
re'; it does not match the pattern ``^[a-zA-Z][a-zA-Z0-9_]*$; you will not be able to use natural naming to access this object; using
getattr()`` will still work, though
check_attribute_name(name)
done. Script completed!
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ ls -l output/ total 478768 -rw-r--r-- 1 amissino inra_umr1349 400593311 Mar 9 14:44 dni-14jai2_pos100-700_181009-root_mean_square.h5
(msiparse_A2M) [amissino@genossh:imzML-to-HDF5-master] $ conda deactivate (base) [amissino@genossh:imzML-to-HDF5-master] cd .. (base) [amissino@genossh:MALDI-MSI] git clone https://github.com/Kawue/provim (base) [amissino@genossh:MALDI-MSI] cd provim (base) [amissino@genossh:provim] $ ls automated_matrix_detection.py fast_convert_hdf.py msi_dimension_reducer.py statistical_evalutaion.py backup interactive_annotation.py msi_image_writer.py winsorize.py basis interactive_matrix_detection.py msi_utils.py workflow_peakpicking.py Dockerfile interactive_peak_threshold.py non-code workflow_pybasis.py easypicker.py LICENSE pycache easypicker_v1.py matrix_postprocessing.py readh5.py environment.yml matrix_preprocessing.py README.md (base) [amissino@genossh:provim] $ conda env create -f environment.yml (base) [amissino@genossh:provim] $ conda env create -f environment.yml Collecting package metadata (repodata.json): done Solving environment: done Preparing transaction: done Verifying transaction: done Executing transaction: done Ran pip subprocess with arguments: ['/home/genouest/inra_umr1349/amissino/miniconda3/envs/provim/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/genouest/inra_umr1349/amissino/MALDI-MSI/provim/condaenv.v_gpuf42.requirements.txt'] Pip subprocess output: Collecting pyqt5-sip Using cached PyQt5_sip-12.7.1-cp38-cp38-manylinux1_x86_64.whl (264 kB) Collecting pyqtwebengine Using cached PyQtWebEngine-5.14.0.tar.gz (47 kB) Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'error'
Pip subprocess error: ERROR: Command errored out with exit status 1: command: /home/genouest/inra_umr1349/amissino/miniconda3/envs/provim/bin/python /home/genouest/inra_umr1349/amissino/miniconda3/envs/provim/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpy90pw2se cwd: /tmp/pip-install-2yh5kjae/pyqtwebengine Complete output (6 lines): Querying qmake about your Qt installation... /home/genouest/inra_umr1349/amissino/miniconda3/envs/provim/bin/qmake -query These bindings will be built: QtWebEngineCore, QtWebEngine, QtWebEngineWidgets. Generating the QtWebEngineCore bindings... _in_process.py: Unable to find file "QtCore/QtCoremod.sip"
----------------------------------------
ERROR: Command errored out with exit status 1: /home/genouest/inra_umr1349/amissino/miniconda3/envs/provim/bin/python /home/genouest/inra_umr1349/amissino/miniconda3/envs/provim/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpy90pw2se Check the logs for full command output.
CondaEnvException: Pip failed
Thank you for your help
Are you using provim on your local machine or on a cluster?
I use it on the cluster machine. I shouldn't?
Ok, in theory thats fine, I used it on our cluster myself, but I have to explain some things.
I activated X11 forwarding via Putty
I installed provim
(base) [amissino@genossh:provim] $ cp environment.yml environment_cp.yml (base) [amissino@genossh:provim] $ vi environment.yml (base) [amissino@genossh:provim] $ conda env create -f environment.yml Collecting package metadata (repodata.json): done Solving environment: done Preparing transaction: done Verifying transaction: done Executing transaction: done
(base) [amissino@genossh:provim] $ conda activate provim
Thank you
Perfect! Now you should be able to use the commands provided above.
Hi dear Karsten,
With this follow script it ran too much time with none sign nor tracks printing. So I cancelled.
(base) [amissino@genossh:MALDI-MSI] $ mv imzML-to-HDF5-master/ provim/ (base) [amissino@genossh:MALDI-MSI] $ cd provim (base) [amissino@genossh:provim] $ conda activate provim (provim) [amissino@genossh:provim] $ python msi_image_writer.py -r imzML-to-HDF5-master/output/dni-14jai2_pos100-700_181009-root_mean_square.h5 -s savepath --write_mz ('imzML-to-HDF5-master/output', 'dni-14jai2_pos100-700_181009-root_mean_square.h5') /omaha-beach/amissino/MALDI-MSI/provim/msi_utils.py:70: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details. if len(h5py.File(join(root,f)).keys()) < 2: ^C
When exceuted with 32 GB memory it run but without compressed the size of *.h5 file nor generated an output file.
(provim) [amissino@genossh:provim] $ sbatch --mem=32GB run_provim.sh Submitted batch job 746703 (provim) [amissino@genossh:provim] $ more slurm-746703.out /omaha-beach/amissino/MALDI-MSI/provim/msi_utils.py:70: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global defau lt h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: ' r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details. if len(h5py.File(join(root,f)).keys()) < 2: ('imzML-to-HDF5-master/output', 'dni-14jai2_pos100-700_181009-root_mean_square.h5') (provim) [amissino@genossh:provim] $ ls -l imzML-to-HDF5-master/output/dni-14jai2_pos100-700_181009-root_mean_square.h5 -rw-r--r-- 1 amissino inra_umr1349 400208238 Mar 11 10:09 imzML-to-HDF5-master/output/dni-14jai2_pos100-700_181009-root_mean_square.h5
Thank you
Oh, sorry I missed a crucial thing in my comment above. It was meant to be:
python workflow_peakpicking.py -r path-to-h5 -s savepath --interactive
This way the picking tool will open. Afterwards you can call the msi_image_writer.py
in combination with the picked h5. If you call the writer on your original data it will create all 46509 mz channel images instead of the way fewer number you achieved by picking.
Ok. I added the workflow_peakpicking.py execution script in run_provim.sh. I think that it successfully run but I don't understand what it prints.
(provim) [amissino@genossh:provim] $ more run_provim.sh
python workflow_peakpicking.py -r imzML-to-HDF5-master/output/dni-14jai2_pos100-700_181009-root_mean_square.h5 -s savepath --interactive python msi_image_writer.py -r imzML-to-HDF5-master/output/dni-14jai2_pos100-700_181009-root_mean_square.h5 -s savepath --write_mz
(provim) [amissino@genossh:provim] $ sbatch --mem=32GB run_provim.sh
Submitted batch job 747090
(provim) [amissino@genossh:provim] $ more slurm-747090.out
/omaha-beach/amissino/MALDI-MSI/provim/msi_utils.py:70: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress
this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAU
LT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
if len(h5py.File(join(root,f)).keys()) < 2:
('imzML-to-HDF5-master/output', 'dni-14jai2_pos100-700_181009-root_mean_square.h5')
pick_on_merge argument is False. If the HDF5 stores more than one data set picking will be done one every single one independent!
Traceback (most recent call last):
File "workflow_peakpicking.py", line 201, in
I think that msi_image_writer.py created 35159 molecular images not 46509. It seems to me that he omitted 11,350 molecular images. did it work properly ?
(provim) [amissino@genossh:provim] $ ls -1 savepath/dni-14jai2_pos100-700_181009-root_mean_square-images/ | wc -l 35159
THANKS FOR ALL YOUR HELP 👏
Regarding the error I guess the most important line is: _tkinter.TclError: couldn't connect to display "localhost:14.0".
It seems like there is a problem with the x11 forwarding. Do you use a Windows or a Linux PC to connect to your cluster?
I tried it with a Windows PC, installed Xming. Started Xming, opened a Powershell, typed $env:DISPLAY= 'localhost:0.0'
and then used ssh in Powershell to connect to my server.
However, if this step still provides problems, just try to use it on your local machine. Interactive remote matplotlib communication can sometimes be a bit tricky.
Regarding the images, you said that it needed much time and you interrupted it correct? That would explain why only 35k images were printed instead of 46k.
No need to say thank you. I am happy that people are interested in the tool.
yes, I used windows to connect to the computer cluster.
When I tried to install the provim environment via the shell of anaconda or cmd of Windows, the result is identical like this.
PS D:\Parts\Metabolomics\DATA\trated_data\ProVim> conda env create -f .\environment.yml Collecting package metadata (repodata.json): done Solving environment: failed
ResolvePackageNotFound:
ok, so to solve mkl please move the respective line in the environment.yml, i.e. - mkl=2019.5
in the pip section, right below - pyqtwebengine
. Like in my updated version.
Regarding the images, thats most likely because I round the m/z values for shorter naming to 3 places. For picked data that is no problem in most cases, but for unpicked it will result in overwriting some images. However, I updated it now to 5 places.
Hi dear Karsten,
I reinstalled provim on our computer cluster. It created 46509 molecular images as we expected.
(provim) [amissino@genossh:provim] $ ls -1 savepath/dni-14jai2_pos100-700_181009-root_mean_square-images/ | wc -l 46509
ok, seems that the rounding was the issue. Did you also tried the picking procedure again? Like I said, GRINE can handle 500 channels at max. The recommendation to use the image writing was "just" for quality control purposes. In general I would always recommend to use the image writer after picking to check if the picked peaks contain useful information.
Btw, I just removed you from the collaborator list of ProViM. We just handed our manuscript in and it is now publicly available.
Yes, I had revived the picking procedure. Both ( workflow_peakpicking.py and msi_image_writer.py), two times, it gives the same results (46509 images).
Ok. The question is how do I automatically select 500 most relevant images from the 46509 images, especially since I would have more images in other samples (infected plants)?
Well, the picking workflow does not support a fixed number currently although this is a great idea, either using intensity or the measure of spatial chaos as an indicator. However, if you start the picking workflow it will open a "mini-tool" that will show you the mean spectrum of your data. You will see a horizontal line as you move the mouse. The aim is to help you to choose a good threshold. You can select a threshold by a right click. Thereafter you have to use the deisotope button. If you do not want to remove any isotopes, just set minimum and maximum to 0.0. On the lower right you will see how many channels are left after picking. Proceed by useing the run button. There will be a new data set called datasetname_autopicked.h5. You should proceed using this one for GRINE.
If you have multiple samples you should either use the same threshold for all of them. Try to get the same number of images for all of them or use the respective command line argument to pick on the mean spectrum of all of them together. Every parameter is explained via python workflow_peakpicking.py -h
.
Doing a comparative analysis with grine is hard anyways. Therefore I would recommend to pick each data set and analyse them separately. You may be able to identify interesting groups of molecules with you can then compare manually afterwards.
So if there is no autopicked file after useing the picking procedure, something went wrong.
Hi dear Karsten,
Hope your area has been slightly affected by covid and you stay good.
I apply two thresholds (as show below) and the deisotroping option. It generates a very small .h5 file of 9.6 Mo for the first test and 16 Mo for a second from a parent file .h5 of 390 Mo.
I also successfully install provim, msi-community-detection, COBI-GRINE using docker container (Docker Toolbox).
In order to explore community structures in MSI image networks, I want to generate performed community detection and graph information, dimension reduction information, similarity matrix information necessary for COBI-GRINE.
In order to explore community structures in MSI image networks, I want to generate performed community detection/graph information, dimension reduction information, similarity matrix information needed for COBI-GRINE.
I am having difficulty managing these files.
$ docker login Authenticating with existing credentials... Login Succeeded $ docker run -v -d ../output/peakpicking/dni-14jai2_pos100-700_181009-root_mean_square_autopicked_deisotroping_treshold_0.04.h5 --rm grine/msicommunitydetection -cm -sm pearson -tm modularity_weighted -dr ica -p ../output/msi-community-detection/
Take care of yourself and your loved ones.
Hey Anani, the covid situation is fine in my area. Hope you are good too.
Nice that you good the .h5 files going. Some details regarding your docker call.
-cm
should either be eigenvector or louvain. I recommend louvain.-tm
. However, this greatly depends on the data.-dr
. In most cases UMAP is far superior to detect inherent structures to other methods.-p
needs to address the path + file name, like path/someName.json
.-v
requires the format path-to-data:/data
. So lets assume your .h5 is in C:\anani\data\output\peakpicking\blub.h5.
The your -v
argument should look like -v C:\anani\data\output\peakpicking:/data
and -d
should appear after --rm
as -v
and --rm
are docker flags, the others are flags of my program.
So a complete example call would look like
docker run -v C:\anani\data\output\peakpicking:/data --rm grine/msicommunitydetection -d /data/dni-14jai2_pos100-700_181009-root_mean_square_autopicked_deisotroping_treshold_0.04.h5 -p /data/msi-community-detection/someName.json -cm louvain -sm pearson -tm statistics -tp mean std 1 -dr umap
Today I also pushed some updates to the msi-community-detection and the cobi-grine repo.
Hi Karsten,
Good. We have completely resumed work. But let's fear a second wave of the epidemic after the holidays.
I get a WARNING and an error (that maybe I didn't notice it during the first build) with docker build -t grine/msicommunitydetection .
$ ls -l total 48 -rw-r--r-- 1 REN-1349-A104+adminigepp 197121 12952 juil. 22 12:01 A2M_GRINE_run.txt drwxr-xr-x 1 REN-1349-A104+adminigepp 197121 0 juil. 22 12:00 grine-v2-dev/ drwxr-xr-x 1 REN-1349-A104+adminigepp 197121 0 juil. 21 18:15 input/ drwxr-xr-x 1 REN-1349-A104+adminigepp 197121 0 juil. 22 10:47 msi-community-detection/ drwxr-xr-x 1 REN-1349-A104+adminigepp 197121 0 juil. 21 19:12 output/ drwxr-xr-x 1 REN-1349-A104+adminigepp 197121 0 juil. 22 12:00 provim/ drwxr-xr-x 1 REN-1349-A104+adminigepp 197121 0 juil. 22 12:00 whide-v2/
$ cd msi-community-detection $ docker build -t grine/msicommunitydetection .
Sending build context to Docker daemon 355.8kB Step 1/8 : FROM continuumio/anaconda3 ---> bdb4a7e92a49 Step 2/8 : COPY environment.yml . ---> 9348cce04bb6 Step 3/8 : COPY main.py . ---> 5e12dbcf4f34 Step 4/8 : COPY /kode ./kode ---> 69fc7b34b2b7 Step 5/8 : RUN conda env create -f environment.yml ---> Running in 73f2f79c715e Collecting package metadata (repodata.json): ...working... done Solving environment: ...working... done
Downloading and Extracting Packages cloudpickle-1.2.2 | 23 KB | ########## | 100% pthread-stubs-0.4 | 5 KB | ########## | 100% pcre-8.44 | 261 KB | ########## | 100% [.....] Preparing transaction: ...working... done Verifying transaction: ...working... done Executing transaction: ...working... done
==> WARNING: A newer version of conda exists. <== current version: 4.8.2 latest version: 4.8.3
Please update conda by running
$ conda update -n base -c defaults conda
Ran pip subprocess with arguments: ['/opt/conda/envs/grine/bin/python', '-m', 'pip', 'install', '-U', '-r', '/condaenv.19t3nvcq.requirements.txt'] Pip subprocess output: Requirement already up-to-date: pyqt5-sip==4.19.18 in /opt/conda/envs/grine/lib/python3.7/site-packages (from -r /condaenv.19t3nvcq.requirements.txt (line 1)) (4.19.18) Requirement already up-to-date: pyqtwebengine==5.12.1 in /opt/conda/envs/grine/lib/python3.7/site-packages (from -r /condaenv.19t3nvcq.requirements.txt (line 2)) (5.12.1)
To activate this environment, use $ conda activate grine To deactivate an active environment, use $ conda deactivate
Error processing tar file(exit status 1): write /opt/conda/envs/grine/lib/libmkl_avx512_mic.so: no space left on device
Should I delete the first version or does the second build automatically replace it?
I did however try to run it.
$ docker run -v D:\Parts\DATA\Metabolomics\rawData\METAPHOR\MALDI-MSI\Image_File\GRINE\output\peakpicking:/data --rm grine/msicommunitydetection -d /data/dni-14jai2_pos100-700_181009-root_mean_square_autopicked_deisotroping_treshold_0.04. h5 -p /data/msi-community-detection/someName.json -cm louvain -sm pearson -tm statistics -tp mean std 1 -dr umap
C:\Program Files\Docker Toolbox\docker.exe: Error response from daemon: invalid mode: /data. See 'C:\Program Files\Docker Toolbox\docker.exe run --help'.
I think one of the sources of this problem is due to the directory format compatible with the host into a Docker container.
I know Docker needs the conversion from Windows API (spoke by Windows 10), to POSIX (spoke by GitBASH), and to Windows API spoke by Docker for Windows version. I use " cmd //c echo pwd " to find a correct path format. I think that the sens of slashes is not correct.
But it doesn't work (the error is python: can't open file 'main.py': [Errno 2] No such file or directory).
$ pwd /d/Parts/DATA/Metabolomics/rawData/METAPHOR/MALDI-MSI/Image_File/GRINE
$ cmd //c echo /d/Parts/DATA/Metabolomics/rawData/METAPHOR/MALDI-MSI/Image_File/GRINE/output/peakpicking/ D:/Parts/DATA/Metabolomics/rawData/METAPHOR/MALDI-MSI/Image_File/GRINE/output/peakpicking/
$ docker run -v D:/Parts/DATA/Metabolomics/rawData/METAPHOR/MALDI-MSI/Image_File/GRINE/output/peakpicking/ --rm grine/msicommunitydetection -d D:/Parts/DATA/Metabolomics/rawData/M ETAPHOR/MALDI-MSI/Image_File/GRINE/output/peakpicking/dni-14jai2_pos100-700_181009-root_mean_square_autopicked_deisotroping_treshold_0.04.h5 -p D:/Parts/DATA/Metabolomics/rawData/ METAPHOR/MALDI-MSI/Image_File/GRINE/output/peakpicking/msi-community-detection/someName.json -cm louvain -sm pearson -tm statistics -tp mean std 1 -dr umap
python: can't open file 'main.py': [Errno 2] No such file or directory
Have a nice day
Dear Karsten Wüllems,
I’m a Ph.D student in France on Brassica napus (rapeseed) specialized metabolites against pathogens. I want to apply your pipeline “Analysis of GRaph mapped Image data NEtworks” on my data. We have disease-resistant and disease-susceptible plants and we want to find features communities within a network build from imaging of cross-sectioned tissue (infected/non-infected).
Your pipeline can identify groups of molecules with distributions that correlate with plant anatomical structures and help us to better understanding the metabolic resistance of rapeseed to pathogens.
We have : • imzML files collected by FTICR - Imagerie MALDI - FTICR SolariX 7T Paracell combisource ESI/MALDI (Bruker Daltonics).
Have you shared the compilation of the pipeline, Step-By-Step Guide or command lines used? I would be grateful if you would let me know.
I sincerely thank you for your answer. Respectfully yours,