ethanbass / chromConverter

Parsers for chromatography data in R (HPLC-DAD/UV, GC-FID, MS)
https://ethanbass.github.io/chromConverter/
GNU General Public License v3.0
28 stars 3 forks source link

Python bindings don't work correctly in latest version of RStudio without altering Python settings (ModuleNotFoundError) #13

Open ethanbass opened 1 year ago

ethanbass commented 1 year ago

Recent versions of RStudio made some strange changes to the way reticulate functions, as discussed in this thread, which interfere with chromConverter's python bindings. chromConverter will still load but python-based parsers will likely not be available if a project is loaded. When trying to access python parsers, a module not found error will be generated. As far as I can tell, this is a bug with RStudio rather than chromConverter (though RStudio developers seem to think this is the expected behavior).

This issue can apparently be resolved by unchecking a box in the RStudio settings. To do this, open RStudio settings and navigate to the Python pane (Tools:Global Options:Python). Then uncheck the box that says "Automatically activate project-local Python environments" and click Apply. RStudio must then be restarted for the settings to take effect.

nathan-loves-soil commented 3 months ago

Hi Ethan, I'm trying to use the rainbow parser in RStudio for Chemstation data file with an ".MS" extension.

dat <- chromConverter::read_chroms(path_to_files, format_in = "chemstation", parser = "rainbow")

Still after unchecking the box in the above post, I'm getting the following error:

Warning in chromConverter::read_chroms(path_to_files, format_in = "chemstation", : Error in py_module_import(module, convert = convert) : ModuleNotFoundError: No module named 'rainbow' Run reticulate::py_last_error() for details. The following chromatograms could not be interpreted: 1

I'm pretty sure I've got the right package for Python installed (rainbow-api), and I've told reticulate to use the latest version of Python. Any thoughts where I might be going wrong?

(also sorry if this isn't the correct place to ask this question, feel free to move this query somewhere else) Thank you,

Nathan

ethanbass commented 3 months ago

Hi Nathan, Do you have miniconda installed? Could you try running, chromConverter::configure_python_environment("rainbow") and then post the output here? Ethan

nathan-loves-soil commented 2 months ago

Hi Ethan,

No I didn't have miniconda installed, but I do now. I ran the line that you provided, and it gave me the option to install it then, so I did. Output was very long, so I won't post it here. I restarted the R session and then tried running my code again, but got the same error as before.

I just ran your line again, here is the ouput:

chromConverter::configure_python_environment("rainbow")

C:\Users\20373478\OneDrive - Curtin\All R Projects\py-GC-MS Library Searches\py_GC_MS_reports>CALL "C:\Users\20373478\AppData\Local\r-miniconda\condabin\activate.bat" "chromConverter"

C:\Users\20373478\OneDrive - Curtin\All R Projects\py-GC-MS Library Searches\py_GC_MS_reports>conda.bat activate "chromConverter" Requirement already satisfied: rainbow-api in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (1.0.9) Requirement already satisfied: numpy in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from rainbow-api) (2.0.0) Requirement already satisfied: matplotlib in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from rainbow-api) (3.9.1) Requirement already satisfied: lxml in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from rainbow-api) (5.2.2) Requirement already satisfied: pandas in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from rainbow-api) (2.2.2) Requirement already satisfied: contourpy>=1.0.1 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (1.2.1) Requirement already satisfied: cycler>=0.10 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (4.53.1) Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (1.4.5) Requirement already satisfied: packaging>=20.0 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (24.1) Requirement already satisfied: pillow>=8 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (10.4.0) Requirement already satisfied: pyparsing>=2.3.1 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (3.1.2) Requirement already satisfied: python-dateutil>=2.7 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from matplotlib->rainbow-api) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from pandas->rainbow-api) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from pandas->rainbow-api) (2024.1) Requirement already satisfied: six>=1.5 in c:\users\20373478\appdata\local\r-miniconda\envs\chromconverter\lib\site-packages (from python-dateutil>=2.7->matplotlib->rainbow-api) (1.16.0) Error in py_module_import(module, convert = convert) : ModuleNotFoundError: No module named 'rainbow' Run reticulate::py_last_error() for details.

Thank you for your assistance :)

ethanbass commented 2 months ago

This seems to be a reticulate configuration issue, since you have rainbow installed in the chromConverter environment, but reticulate isn't finding it. I'm not sure exactly what the issue might be though. Did you restart your R session after installing miniconda and all that? It could be that reticulate has already loaded a different python environment in your R session and can't switch to the chromConverter environment.

nathan-loves-soil commented 2 months ago

Oh yep okay. Thanks, that is sorted now, I realised I had a line instructing what version of python to use [ reticulate::use_python() ] , so I've commented that out and the script runs. However, there's a new error, seems like it's having an issue parsing the file maybe?

path_to_files <- "C:\\gcms\\1\\data\\py_GC_MS\\B121468.D\\DATA.MS"  
dat <- chromConverter::read_chroms(path_to_files, format_in = "chemstation", parser = "rainbow")

C:\Users\20373478\OneDrive - Curtin\All R Projects\py-GC-MS Library Searches\py_GC_MS_reports>CALL "C:\Users\20373478\AppData\Local\r-miniconda\condabin\activate.bat" "C:\Users\20373478\AppData\Local\r-miniconda\envs\r-reticulate"

C:\Users\20373478\OneDrive - Curtin\All R Projects\py-GC-MS Library Searches\py_GC_MS_reports>conda.bat activate "C:\Users\20373478\AppData\Local\r-miniconda\envs\r-reticulate" Requirement already satisfied: aston in c:\users\20373478\appdata\local\r-miniconda\envs\r-reticulate\lib\site-packages (0.7.1) Requirement already satisfied: numpy in c:\users\20373478\appdata\local\r-miniconda\envs\r-reticulate\lib\site-packages (2.0.0) Requirement already satisfied: scipy>=1.2.0 in c:\users\20373478\appdata\local\r-miniconda\envs\r-reticulate\lib\site-packages (from aston) (1.14.0) Done! Warning in chromConverter::read_chroms(path_to_files, format_in = "chemstation", : Error in py_call_impl(callable, call_args$unnamed, call_args$named) : ValueError: strides is incompatible with shape of requested array and size of buffer Run reticulate::py_last_error() for details.

The following chromatograms could not be interpreted: 1

Thanks again!!

ethanbass commented 2 months ago

Ah, i see. That would do it. Can you send a copy of the file? Could be a format that isn't yet supported by rainbow? (There are multiple .MS formats produced by different versions of chemstation and openlab). Also do you happen to know what version of chemstation the file came from?

nathan-loves-soil commented 2 months ago

GitHub doesn't like the .MS file type being attached in a comment. What would be your preferred way for me to send it to you?

Unfortunately I don't know what version of the software it has come from. I suspect it's pretty old as the GC-MS that we used is fairly old. If it's important to know, I can find out next week what version the computer attached to the GC-MS uses. The version that I have on my PC which can read the file is MSD ChemStation F.01.03.2357.

Thank you Ethan!

ethanbass commented 2 months ago

if you want you could email it to me at @.***

On Fri, Jul 12, 2024 at 12:03 AM nathan-loves-soil @.***> wrote:

GitHub doesn't like the .MS file type being attached in a comment. What would be your preferred way for me to send it to you?

Unfortunately I don't know what version of the software it has come from. I suspect it's pretty old as the GC-MS that we used is fairly old. If it's important to know, I can find out next week what version the computer attached to the GC-MS uses. The version that I have on my PC which can read the file is MSD ChemStation F.01.03.2357.

Thank you Ethan!

— Reply to this email directly, view it on GitHub https://github.com/ethanbass/chromConverter/issues/13#issuecomment-2224446247, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZEBOZ65R22SSSFL6VHXRDZL5ISPAVCNFSM6AAAAABKMX67M6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRUGQ2DMMRUG4 . You are receiving this because you authored the thread.Message ID: @.***>

ethanbass commented 2 months ago

Hi Nathan, I just found your file in my spam folder. I am also getting the same error message with rainbow, but the entab parser seems to be able to read your file (see installation instructions here: https://github.com/ethanbass/chromConverter/?tab=readme-ov-file#entab). Would this be an option for you? I will also try to look into where rainbow is running into problems, but I'm not sure when I'll get around to it. You could also post an issue on the rainbow github page (https://github.com/evanyeyeye/rainbow/) if you like. It's possible they might get to it sooner.

Ethan

nathan-loves-soil commented 2 months ago

Hi Ethan, I will give that a go this week! I'll let you know how I go with using the entab parser. Thanks so much for your help

nathan-loves-soil commented 2 months ago

Hey Ethan, just letting you know that the entab parser worked for me! I really appreciated your willingness to help with troubleshooting this with me. Thanks again, Nathan

ethanbass commented 2 months ago

Wonderful. Glad to hear it!! Ethan

On Tue, Jul 16, 2024 at 4:11 AM nathan-loves-soil @.***> wrote:

Hey Ethan, just letting you know that the entab parser worked for me! I really appreciated your willingness to help with troubleshooting this with me. Thanks again, Nathan

— Reply to this email directly, view it on GitHub https://github.com/ethanbass/chromConverter/issues/13#issuecomment-2230286653, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZEBO3OGMOO42FO4PP4S63ZMTIRVAVCNFSM6AAAAABKMX67M6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZQGI4DMNRVGM . You are receiving this because you authored the thread.Message ID: @.***>

nathan-loves-soil commented 1 month ago

Hi Ethan,

Me again, sorry. I had a look around the internet for a solution first but I couldn't find one.

I was just wondering if you have any advice about the GC-MS file once it's parsed. I think I am missing something.

Here is the structure of the file:

str(dat) List of 1 $ C:\gcms\1\data\py_GC_MS\B121468\DATA.MS:'data.frame': 1271906 obs. of 3 variables: ..$ rt : num [1:1271906] 0.0969 0.0969 0.0969 0.0969 0.0969 0.0969 0.0969 0.0969 0.0969 0.0969 ... ..$ mz : num [1:1271906] 490 475 417 360 356 ... ..$ intensity: num [1:1271906] 280 204 309 208 227 448 417 222 295 326 ... ..- attr(, "sample_name")= chr "Nathan yellow soil trial" ..- attr(, "sample_id")= num 75 ..- attr(, "detector")= chr "GC/MS Ins" ..- attr(, "detector_range")= chr "" ..- attr(, "method")= chr "PE4SC40S" ..- attr(, "operator")= chr "" ..- attr(, "run_datetime")= POSIXct[1:1], format: "2024-03-24 10:15:00" ..- attr(, "time_interval")= logi NA ..- attr(, "time_unit")= chr "Minutes" ..- attr(, "source_file")= chr "C:\gcms\1\data\py_GC_MS\B121468.D\ DATA.MS" ..- attr(, "data_format")= chr "wide" ..- attr(, "parser")= chr "entab" ..- attr(*, "format_out")= chr "matrix"

The $intensity part seems to show intensity for the total ion chromatogram, as when it is plotted against $rt (retention time), the plot matches the chromatogram that I see in the Chemstation software. However, I seem to be missing another 'intensity' part of the file, which gives the intensities (abundance) for the mass spectra; the mass to charge ratio seems to be stored in the $mz part of the file.

I'm just wondering what I might be doing wrong or what I'm missing. I was hoping to get R to do compound library searches and some multivariate statistics. Any advice would be much appreciated. Thanks again for your help so far.

Kind regards,

Nathan

On Wed, 17 Jul 2024 at 00:06, Ethan Bass @.***> wrote:

Wonderful. Glad to hear it!! Ethan

On Tue, Jul 16, 2024 at 4:11 AM nathan-loves-soil @.***> wrote:

Hey Ethan, just letting you know that the entab parser worked for me! I really appreciated your willingness to help with troubleshooting this with me. Thanks again, Nathan

— Reply to this email directly, view it on GitHub < https://github.com/ethanbass/chromConverter/issues/13#issuecomment-2230286653>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADZEBO3OGMOO42FO4PP4S63ZMTIRVAVCNFSM6AAAAABKMX67M6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZQGI4DMNRVGM>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/ethanbass/chromConverter/issues/13#issuecomment-2231312770, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZEVNRT26A3XE5A4YDXEOBDZMVAJFAVCNFSM6AAAAABKMX67M6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRGMYTENZXGA . You are receiving this because you commented.Message ID: @.***>

ethanbass commented 1 month ago

Hi Nathan,

The intensity should be the intensity at each mass fragment as specified by the mz column. You would calculate the TIC by summing over all the mz values at each time point, e.g.:

library(dplyr)
x<-dat[[1]]
tic <- x |> group_by(rt) |> summarize_at("intensity",sum) 
matplot(tic$rt, tic$intensity,type='l')

You should be able to extract the mass spectrum for whichever retention time by filtering on time, e.g., to get the mass spectrum of the first scan, you could do:

times <- unique(x$rt)
spec <- x[x$rt == times[1], -1]

Here is a simple plot function you could use to plot the mass spectrum in base R:

plot_spec <- function(spec, lab_int=0.2, digits=1){
  plot(spec, type = "h")
  lab.idx <- which(spec$intensity > lab_int * max(spec$intensity))
  text(spec$mz[lab.idx], spec$intensity[lab.idx], round(spec$mz[lab.idx], 
                    digits), offset = 0.25, pos = 3, cex = 0.5)
}

So you should be able to do : plot_spec(spec).

Hopefully that helps?

Ethan

Also, I'm not sure what you were thinking of doing for database search? I have some functions in my mzinspectr package (https://github.com/ethanbass/mzinspectr/) that you could check out, but it's still pretty poorly documented still and isn't really oriented toward analyzing raw GC-MS data at least in the current iteration.