draeger-lab / pymCADRE

pymCADRE enables the reconstruction of tissue-specific metabolic models in Python using transcriptomic data and information of the network topology.
GNU General Public License v3.0
12 stars 5 forks source link

RuntimeWarning Error During Execution of pymCADRE's 'rank_reactions' #3

Closed Rohak72 closed 1 year ago

Rohak72 commented 1 year ago

Hi! I'm currently trying to apply pymCADRE to build a context-specific metabolic model. However, when I run the 'rank_reactions' method with my gene ID list (G), ubiquity scores (U), and reaction confidence scores specified, I'm met by a RuntimeWarning error that reads: 'All-NaN axis encountered'. Below is a screenshot of the full traceback:

Screen Shot 2023-08-21 at 6 07 36 PM

What do you think could be going on here? If you would like me to send the model and inputs I'm using, I would be more than happy to do so. Any clarification/feedback would be greatly appreciated, thanks!

-Rohak

Rohak72 commented 1 year ago

@NantiaL Any advice?

NantiaL commented 1 year ago

Hey @Rohak72, please excuse my late response. I would be happy to test your inputs if you would like to send them to me. Otherwise, I guess this error comes from the format of your input file.

Rohak72 commented 1 year ago

Hi Nantia,

No worries at all, thanks for your response. Currently, my script works just fine until the last readout: "rank_reactions has finished"; at this point, I'm met with a warning (from Cobra) saying, "UserWarning: need to pass in a list".

In terms of the inputs, I have attached all of the relevant files: (1) a link to my metabolic model [HumanGEM.xml], (2) a CSV containing the gene Entrez IDs and ubiquity scores [inputData.csv], and (3) a CSV containing the reaction confidence scores [rxnConfidenceScores.csv]. In order to successfully extract each of the required input pieces, I think the following Python code should work:

model = cobra.io.read_sbml_model("Human-GEM.xml") G = list(pd.read_csv("InputData.csv")["EntrezID"].values) U = list(pd.read_csv("InputData.csv")["GeneCall"].values) rxnConfidenceScores = np.float64(list(rxnConfidenceScores["ConfidenceLevel"]))

Please let me know if you are able to use these files properly!

I also had a few queries about my general approach. (1) The Human GEM I'm using has all of its genes mapped to Ensembl IDs; I actually used those IDs (instead of Entrez) to run the script and everything seemed to work just fine. Is this method valid? If so, I can provide the Ensembl IDs separately. (2) I didn't quite understand how to generate the ubiquity scores. Currently, my gene calls are 1 (top 15% of expression), 0.5 (middle 70%), and 0 (bottom 15% of expression). Is this suboptimal and if so, how could I more rigorously define the ubiquity scores? Any clarification regarding these points would be tremendously helpful, thanks!

If you need any more information or new input data, please let me know. I can also send along the script I'm currently as a reference. Thanks again for your time, I really appreciate it!

Best, Rohak

inputData.csv. rxnConfidenceScores.csv. Link to GEM: https://github.com/SysBioChalmers/Human-GEM/blob/main/model/Human-GEM.xml.

NantiaL commented 1 year ago

Hey Rohak,

Based on the format of your input data, and in the case of the variable U, your suggested Python code to read the file is correct. Alternatively, you could use only

U = pd.read_csv('/Users/leonidou/Downloads/inputData.csv')
U = list(U['GeneCall'])

instead of

U = U.rename(columns={0: "Scores"}) 
U = list(U['Scores'])

The rest does not need to be modified.

Regarding the following points: (1) Your model uses Ensembl IDs but your inputData.csv includes EntrezIDs. If you have a way to map them and extract the associated ubiquity scores properly, then this method is fine. (2) Your classification criterion sounds logical. Alternatively, you could also set a threshold based on the distribution of values and consider everything above your cut-off to be 1 (expressed) and below to be 0 (not expressed). In case you have multiple samples, the ubiquity scores are calculated by taking the mean over all samples. It also depends on what the TreatmentExpr values are exactly.

I hope this answers your questions.

Rohak72 commented 1 year ago

Hi Nantia,

Thanks, this makes a lot of sense! I wonder: could you successfully run pymCADRE with these inputs and if so, how long did it take? As I mentioned earlier, everything seems to be working fine, but the final "rank_reactions finished" log message was never reached (due to significant time delay).

Sorry for all the questions, let me know if you have any more info! Thanks.

Best, Rohak

NantiaL commented 1 year ago

Hey Rohak,

with your files I am getting the error:

TypeError: cannot pickle 'SwigPyObject' object

that is traced back to the COBRApy function find_blocked_reactions().

This seems to be due to the size of the model. Also using a model with over 10,000 reactions is definetely pretty challenging for conventional computer.

Best regards Nantia