Using rNOMADS to pull and process grib data, but causes XML error

rkertesz commented 8 years ago

It looks like when I change the example code from ''' urls.out <- CrawlModels(abbrev = "gfs_0p50", depth = 2, verbose = FALSE) ''' TO ''' urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = FALSE) ''' I get the following parse error. I think it is in something called by the webcrawler function: Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

A couple of questions. 1. Where can I use XML:::HUGE and 2. Do you think there is a better way of grabbing and processing the following data than using a grib file? I've never used GrADS and it may be easier but can't seem to find the right info on there anyway.

http://www.ftp.ncep.noaa.gov/data/nccf/com/rap/prod/narre.20151116/ensprod/narre.t14z.prob.grd130.f04.grib2

from this, the interesting stuff is the cumulative functions for rainfall of 0.25mm depth to 25.4 mm depth, specifically "Total_precipitation_surface_3_Hour Accumulation_probability_above_0p25"

I am happy to continue using the grib file but I need to be able to drill down to a subset of rap without the parsing issue.

rkertesz commented 8 years ago

Ok. This is sophomoric but I was supposed to use narre not rap. I'll see if this flies without the error. Still, the issue with rap parsing exists but this is the url I eventually generated using the following:

http://nomads.ncep.noaa.gov/cgi-bin/filter_narre.pl directory: /narre.20151116 subd: ensprod file: narre.t14z.prob.grd130.f05.grib2 surface levels only APCP data only

URL= http://nomads.ncep.noaa.gov/cgi-bin/filter_narre.pl?file=narre.t14z.prob.grd130.f05.grib2&lev_surface=on&var_APCP=on&leftlon=0&rightlon=360&toplat=90&bottomlat=-90&dir=%2Fnarre.20151116%2Fensprod

rkertesz commented 8 years ago

Tried using narre and got the same error. To make it even more confusing, although it is late at night so maybe I am just confused but if I look at the three links from this website http://nomads.ncep.noaa.gov/ Specifically, the grib filter , http , and OpenDAP-alt links, then I get .prob (probability) data for 11/16 but not 11/17 data when looking at the grib filter. I can get .prob files for both 11/16 and 11/17 when browsing http, but I get no .prob data for either 11/16 or 11/17 when using OpenDAP. That is unfortunate because OpenDap was able to parse and navigate the structure without throwing a fit.

dannycbowman commented 8 years ago

Hey rkertesz,

I just tried urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = FALSE) and urls.out <- CrawlModels(abbrev = "narre", depth = 1, verbose = FALSE) and did not have any trouble.

I'm wondering if you're not using the most recent version of rNOMADS> Can you give me the output of sessionInfo()? Here's mine:

sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rNOMADS_2.1.6 rvest_0.3.0 xml2_0.1.2

loaded via a namespace (and not attached): [1] httr_0.6.0 selectr_0.2-3 magrittr_1.5 tools_3.2.2 Rcpp_0.12.1
[6] stringi_0.4-1 stringr_1.0.0 XML_3.98-1.3

rkertesz commented 8 years ago

Thanks for taking a look at this. I was able to actually get rap to work today but yet narre didn't work. I copied and pasted your text verbatim.

urls.out <- CrawlModels(abbrev = "narre", depth = 1, verbose = FALSE) Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = FALSE) [Works ok]

urls.out <- CrawlModels(abbrev = "rap", depth = 1, verbose = TRUE) [1] "http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120"

I noticed that some of my packages are slightly different. rvest is newer and many of the packages loaded in namespace are newer. There is no XML_3.98-1.3 [.... Edit: I added it explicitly but it didn't help]

sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200)

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rNOMADS_2.1.6 rvest_0.3.1 xml2_0.1.2

loaded via a namespace (and not attached): [1] httr_1.0.0 R6_2.1.1 magrittr_1.5 tools_3.2.2 Rcpp_0.12.2 stringi_1.0-1 stringr_1.0.0

rkertesz commented 8 years ago

This is throwing me the error WebCrawler(url, depth = 1, verbose = TRUE) Doesn't error when I populate url with the "rap" but does when I use "narre"

dannycbowman commented 8 years ago

Check this out: http://stackoverflow.com/questions/17154308/parse-xml-files-1-megabyte-in-r This makes sense, actually. Sometimes the XML document you're trying to pull is above the 1 meg threshold, sometimes it's not. So, the error is not really predictable. The answer here does not help you much, but I can fix it in my code and upload a new version of rNOMADS. I'll get to it in the next few days, if not sooner.

rkertesz commented 8 years ago

Great. That is what I was afraid of. Looks like the culprit is here ~~links <- LinkExtractor("http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120") Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]~~

Actually this is the culprit html.tmp <- xml2::read_html("http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120") Error: Excessive depth in document: 256 use XML_PARSE_HUGE option [1]

but when I go to the guide on xml2, I see no parse options or HUGE anywhere https://cran.r-project.org/web/packages/xml2/xml2.pdf

rkertesz commented 8 years ago

More info here. Do you have another solution than "Huge"? I am saying that so much I've started to say it just like Trump http://stackoverflow.com/questions/31419409/set-xml-parse-huge-option-for-xml2xml-text-in-r

dannycbowman commented 8 years ago

This is puzzling: the HTML on http://nomads.ncep.noaa.gov/cgi-bin/filter_rap.pl?dir=%2Frap.20151120 is really not that big (certainly < 1 mb) and seems the same as other models that work fine, such as the GFS. I also am a little concerned by the stack overflow question you referred to, since I need to have official CRAN solutions (not some user's github)...and the fact that the option doesn't even exist is worrying. Anyway, thank you for your research, you saved me a lot of time.

I've opened a stack overflow question here: http://stackoverflow.com/questions/33819103/parsing-small-web-page-with-xml2-throws-xml-parse-huge-error

XML parsing is not my strong point, and I've had success with rNOMADS related questions on Stack Overflow before.

rkertesz commented 8 years ago

Did anything ever come of this? Still just a hanging chad for the moment?

dannycbowman commented 8 years ago

Ruben,

Thank you for reminding me. I just figured out a work around, but I don't think I will be able to add it into the official package as yet.

See:

http://stackoverflow.com/questions/31419409/set-xml-parse-huge-option-for-xml2xml-text-in-r

If you install shabbychef's version of xml2 (see his comment below the main question), the issue seems to be resolved.

It works for me - try it out and let me know how it works for you.

Danny

Daniel C. Bowman Doctoral Candidate in Geophysics UNC Chapel Hill phone: 575-418-8555 curriculum vitae: http://www.unc.edu/~haksaeng/curriculum_vitae/bowman_cv.pdf LinkedIn: https://www.linkedin.com/in/dannycbowman web:http://geosci.unc.edu/page/daniel-c-bowman twitter: @dannycbowman

From: Ruben notifications@github.com Sent: Wednesday, December 30, 2015 2:54 AM To: dannycbowman/cageo-rnomads Cc: Bowman, Daniel Subject: Re: [cageo-rnomads] Using rNOMADS to pull and process grib data, but causes XML error (#1)

Did anything ever come of this? Still just a hanging chad for the moment?

Reply to this email directly or view it on GitHubhttps://github.com/dannycbowman/cageo-rnomads/issues/1#issuecomment-167953904.

rkertesz commented 8 years ago

It works. I've run into another interesting bug but it's unrelated. I can post here but it relates more to rNOMADS "core". I will look to see if there is a better place to post.

dannycbowman commented 8 years ago

Thank you for checking. Which solution did you use: shabbychef's or was it eventually incorporated into an official version of xml2? I never got any response from posting on the xml2 github site.

2016-02-16 18:38 GMT-05:00 Ruben notifications@github.com:

Closed #1 https://github.com/dannycbowman/cageo-rnomads/issues/1.

— Reply to this email directly or view it on GitHub https://github.com/dannycbowman/cageo-rnomads/issues/1#event-552813605.

dannycbowman / cageo-rnomads

Using rNOMADS to pull and process grib data, but causes XML error #1