dannycbowman / cageo-rnomads

Code examples from Bowman and Lees (2015) Near real time weather and ocean model data access with rNOMADS, Computers & Geosciences DOI: 10.1016/j.cageo.2015.02.013
GNU General Public License v2.0
0 stars 1 forks source link

Using rNOMADS to pull and process grib data, but causes XML error #1

Closed rkertesz closed 8 years ago

rkertesz commented 8 years ago

It looks like when I change the example code from ''' urls.out <- CrawlModels(abbrev = "gfs_0p50", depth = 2, verbose = FALSE) ''' TO ''' urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = FALSE) ''' I get the following parse error. I think it is in something called by the webcrawler function: Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

A couple of questions. 1. Where can I use XML:::HUGE and 2. Do you think there is a better way of grabbing and processing the following data than using a grib file? I've never used GrADS and it may be easier but can't seem to find the right info on there anyway.


from this, the interesting stuff is the cumulative functions for rainfall of 0.25mm depth to 25.4 mm depth, specifically "Total_precipitation_surface_3_Hour Accumulation_probability_above_0p25"

I am happy to continue using the grib file but I need to be able to drill down to a subset of rap without the parsing issue.

rkertesz commented 8 years ago

Ok. This is sophomoric but I was supposed to use narre not rap. I'll see if this flies without the error. Still, the issue with rap parsing exists but this is the url I eventually generated using the following:

http://nomads.ncep.noaa.gov/cgi-bin/filter_narre.pl directory: /narre.20151116 subd: ensprod file: narre.t14z.prob.grd130.f05.grib2 surface levels only APCP data only

URL= http://nomads.ncep.noaa.gov/cgi-bin/filter_narre.pl?file=narre.t14z.prob.grd130.f05.grib2&lev_surface=on&var_APCP=on&leftlon=0&rightlon=360&toplat=90&bottomlat=-90&dir=%2Fnarre.20151116%2Fensprod

rkertesz commented 8 years ago

Tried using narre and got the same error. To make it even more confusing, although it is late at night so maybe I am just confused but if I look at the three links from this website http://nomads.ncep.noaa.gov/ Specifically, the grib filter , http , and OpenDAP-alt links, then I get .prob (probability) data for 11/16 but not 11/17 data when looking at the grib filter. I can get .prob files for both 11/16 and 11/17 when browsing http, but I get no .prob data for either 11/16 or 11/17 when using OpenDAP. That is unfortunate because OpenDap was able to parse and navigate the structure without throwing a fit.

dannycbowman commented 8 years ago

Hey rkertesz,

I just tried urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = FALSE) and urls.out <- CrawlModels(abbrev = "narre", depth = 1, verbose = FALSE) and did not have any trouble.

I'm wondering if you're not using the most recent version of rNOMADS> Can you give me the output of sessionInfo()? Here's mine:

sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rNOMADS_2.1.6 rvest_0.3.0 xml2_0.1.2

loaded via a namespace (and not attached): [1] httr_0.6.0 selectr_0.2-3 magrittr_1.5 tools_3.2.2 Rcpp_0.12.1
[6] stringi_0.4-1 stringr_1.0.0 XML_3.98-1.3

rkertesz commented 8 years ago

Thanks for taking a look at this. I was able to actually get rap to work today but yet narre didn't work. I copied and pasted your text verbatim.

urls.out <- CrawlModels(abbrev = "narre", depth = 1, verbose = FALSE) Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = FALSE) [Works ok]

urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = TRUE) [1] "http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120"

I noticed that some of my packages are slightly different. rvest is newer and many of the packages loaded in namespace are newer. There is no XML_3.98-1.3 [.... Edit: I added it explicitly but it didn't help]

sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200)

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rNOMADS_2.1.6 rvest_0.3.1 xml2_0.1.2

loaded via a namespace (and not attached): [1] httr_1.0.0 R6_2.1.1 magrittr_1.5 tools_3.2.2 Rcpp_0.12.2 stringi_1.0-1 stringr_1.0.0

rkertesz commented 8 years ago

This is throwing me the error WebCrawler(url, depth = 1, verbose = TRUE) Doesn't error when I populate url with the "rap" but does when I use "narre"

dannycbowman commented 8 years ago

Check this out: http://stackoverflow.com/questions/17154308/parse-xml-files-1-megabyte-in-r This makes sense, actually. Sometimes the XML document you're trying to pull is above the 1 meg threshold, sometimes it's not. So, the error is not really predictable. The answer here does not help you much, but I can fix it in my code and upload a new version of rNOMADS. I'll get to it in the next few days, if not sooner.

rkertesz commented 8 years ago

Great. That is what I was afraid of. Looks like the culprit is here ~~links <- LinkExtractor("http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120") Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]~~

Actually this is the culprit html.tmp <- xml2::read_html("http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120") Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

but when I go to the guide on xml2, I see no parse options or HUGE anywhere https://cran.r-project.org/web/packages/xml2/xml2.pdf

rkertesz commented 8 years ago

More info here. Do you have another solution than "Huge"? I am saying that so much I've started to say it just like Trump http://stackoverflow.com/questions/31419409/set-xml-parse-huge-option-for-xml2xml-text-in-r

dannycbowman commented 8 years ago

This is puzzling: the HTML on http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120 is really not that big (certainly < 1 mb) and seems the same as other models that work fine, such as the GFS. I also am a little concerned by the stack overflow question you referred to, since I need to have official CRAN solutions (not some user's github)...and the fact that the option doesn't even exist is worrying. Anyway, thank you for your research, you saved me a lot of time.

I've opened a stack overflow question here: http://stackoverflow.com/questions/33819103/parsing-small-web-page-with-xml2-throws-xml-parse-huge-error

XML parsing is not my strong point, and I've had success with rNOMADS related questions on Stack Overflow before.

rkertesz commented 8 years ago

Did anything ever come of this? Still just a hanging chad for the moment?

dannycbowman commented 8 years ago


Thank you for reminding me. I just figured out a work around, but I don't think I will be able to add it into the official package as yet.



If you install shabbychef's version of xml2 (see his comment below the main question), the issue seems to be resolved.

It works for me - try it out and let me know how it works for you.


Daniel C. Bowman Doctoral Candidate in Geophysics UNC Chapel Hill phone: 575-418-8555 curriculum vitae: http://www.unc.edu/~haksaeng/curriculum_vitae/bowman_cv.pdf LinkedIn: https://www.linkedin.com/in/dannycbowman web:http://geosci.unc.edu/page/daniel-c-bowman twitter: @dannycbowman

From: Ruben notifications@github.com Sent: Wednesday, December 30, 2015 2:54 AM To: dannycbowman/cageo-rnomads Cc: Bowman, Daniel Subject: Re: [cageo-rnomads] Using rNOMADS to pull and process grib data, but causes XML error (#1)

Did anything ever come of this? Still just a hanging chad for the moment?

Reply to this email directly or view it on GitHubhttps://github.com/dannycbowman/cageo-rnomads/issues/1#issuecomment-167953904.

rkertesz commented 8 years ago

It works. I've run into another interesting bug but it's unrelated. I can post here but it relates more to rNOMADS "core". I will look to see if there is a better place to post.

dannycbowman commented 8 years ago

Thank you for checking. Which solution did you use: shabbychef's or was it eventually incorporated into an official version of xml2? I never got any response from posting on the xml2 github site.

2016-02-16 18:38 GMT-05:00 Ruben notifications@github.com:

Closed #1 https://github.com/dannycbowman/cageo-rnomads/issues/1.

— Reply to this email directly or view it on GitHub https://github.com/dannycbowman/cageo-rnomads/issues/1#event-552813605.