Accio / KEGGgraph

The KEGGgraph package to parse KEGG pathways in R into graph objects
13 stars 3 forks source link

parseKGML2Graph returning an empty fille #7

Closed Sabor117 closed 1 year ago

Sabor117 commented 2 years ago

I have been having an issue with the parseKGML2Graph() function, stemming from its internal function parseKGML().

The error in question is:

Error in if (fileSize < 100L) msg <- paste(msg, "[WARNING] File size (",  : 
  missing value where TRUE/FALSE needed

This error implies that the KGML file that parseKGML2Graph() is looking for does not exist or is empty. However, this does not seem to be the case.

I am working from a large list of pathways in KEGG and have been seeing numerous errors for multiple different pathways. The code I am using is as follows:

tmp_fl = tempfile()

pathway_kgml = try(KEGGgraph::retrieveKGML(path_check, ### Search for given pathway
                                  organism = "hsa", ### Organism
                                  destfile = tmp_fl,
                                  method = "wget", ### Utilises wget method
                                  quiet = TRUE))

pathway_info = KEGGgraph::parseKGML2Graph(pathway_kgml, ### pathway kgml file
                                 expandGenes = TRUE, ### expand paralogue nodes
                                 genesOnly = FALSE) ### include connections to things which aren't genes

While this is not exhaustive, here are three options for the path_check variable which produce this error: path_check = "path:hsa05200", path_check = "path:hsa05206" and path_check = "path:hsa05010".

All three of these pathways produce pathway_kgml outputs from the first function:

"http://rest.kegg.jp/get/hsa05010/kgml"
"http://rest.kegg.jp/get/hsa05200/kgml"
"http://rest.kegg.jp/get/hsa05206/kgml"

All three of which seem to be existing XML files for KEGG. So it is confusing why the second function would not be able to access them correctly.

Any help appreciated.

Sabor117 commented 1 year ago

I wanted to add an additional aspect to this issue in that I've actually been noticing it with increasing frequency recently and, increasingly problematically, it is occurring with pathways where it has previously actually worked.

It's difficult to provide specifics of dates but I have at a previous instance ran the command:

pathway_info = KEGGgraph::parseKGML2Graph(pathway_kgml, ### pathway kgml file
                                 expandGenes = TRUE, ### expand paralogue nodes
                                 genesOnly = FALSE) ### include connections to things which aren't genes

Using pathway hsa04010 (as an example). I can check this because I have at some point (around March of this year, 2022) been able to output a list of genes from this exact command (and saved the list).

Something to note is that I have been switching more to using R version 4.1 recently (from 3.6) and am wondering if this is potentially the source of this problem and whether the package needs updating to work with R 4.0?

Sabor117 commented 1 year ago

So, I've been working on this a fair bit the last two weeks and I have an update for David and anyone else who ends up encountering this problem.

It APPEARS that the parseKGML2Graph function is actually working pretty much as intended. After all of this time, that was never the issue.

Instead, the issue may be in the retrieveKGML function. If you use the output of that function directly with parseKGML2Graph it will not work and will throw the error I mention above (even though it will show you a link to a working KGML).

However, if you use that link and physically download a copy of the KGML (using wget or something) and then use parseKGML2Graph with that local copy then suddenly it works again.

This suggests there is something either not working quite right with retrieveKGML or the issue may even be with the KEGG site itself.