PSU-CSAR / bagis-pro

BAGIS for ArcGIS Pro
4 stars 1 forks source link

Specification for uploading .pdf reports to NRCS AGOL #12

Closed lbross closed 1 year ago

lbross commented 3 years ago

We will likely need to initially upload the .pdfs using the Python ArcGIS Rest API since ESRI is dragging their feet with our ESRIHttpRequest ticket. We need to develop a specification for uploading these .PDFs to AGOL. Some questions:

  1. Do we upload both the full reports and each single page?
  2. Is there a folder structure for organizing the reports/single pages?
  3. Template for brief summary, description, and terms of use?
  4. What tag(s) should be used. AGOL requires at least one
  5. Any categories or credits?

Initially this will be a command prompt so that we can get some of the reports online.

jdduh commented 3 years ago

Do we upload both the full reports and each single page? Both. We need to come up with unique names for each file. Most maps are single page, but some "individual" reports could have more then one pages.

Is there a folder structure for organizing the reports/single pages? I created a folder called "basin_analysis_reports." Let's put the reports there. All uploaded reports need to be shared with the "nwcc_Content" group and have a sharing level of "Everyone (public)".

Template for brief summary, description, and terms of use? Use the map/table/diagram name as the description. The description is "This report was generated in Basin Analysis GIS (BAGIS). See _____.pdf (a link to the online help pdf file to be hosted on NRCS AGOL) for a complete description of the report. Please contact NWCC (https://www.wcc.nrcs.usda.gov/) for any questions." Term of use "Public domain data. See https://www.wcc.nrcs.usda.gov/disclaimer.htm for disclaimer."

What tag(s) should be used. AGOL requires at least one GIS, BAGIS, SNOTEL, eBagis

Any categories or credits? Leave categories blank for now. Credits: Basin Analysis GIS is developed under the collaboration between NRCS NWCC and the Center for Spatial Analysis & Research at Portland State University.

lbross commented 3 years ago

The initial upload for this interface will be simple. We provide it with a folder path and it will iterate through all of the .pdf files in the path and upload them to AGOL. Each file will have a unique name. We can append the AOI name to the base .pdf name when BAGIS-PRO copies them to the central folder. I'm not sure what you mean by "individual" reports? We have the single page maps/charts and the two multipage reports.

How are you planning to support searching the reports? If we put them all in the same folder, we will have hundreds eventually.

We can't extract the map/table/diagram name out of the .pdf to use it for the summary. We could do a matrix which looks for patterns in the file name to assemble a pre-defined summary. We could also add some code on the BAGIS-PRO side to create a text file that the script can read to find the summary.

Let's pick a name for the help file that will be hosted on NRCS AGOL. We don't need content right away but we can put up a template and then we will have a link. We can update and overwrite the template whenever we have a new version and the link should still work. It is easier to set the description when we upload than to go back and update it later.

For this initial attempt, I would rather not support version control. We will check to see if the file name already exists, if it does we will overwrite it and make a note in the log. We could also write a prequel script that checks to see if any of the files exist, giving the analyst a chance to change file names if they don't want to overwrite.

jdduh commented 3 years ago

We have the single page maps/charts and the two multipage reports. - This is an example of the single page maps/charts that could have multiple pages - critical precipitation map/table. Some maps or tables don't make a lot of sense by themselves. I will use the map/chart list that you provide to come up with a spec sheet showing which single page map/table have multiple pages.

Search the report? Probably by AOI names and through spatial query on a point layer. I will compile a USGS forecast point layer for such purpose. The URL (or some kinds of unique search keywords) associated with the reports need to be saved in the attribute table of the point layer. AGOL doesn't support nested folder structure, which makes it a problem to let each AOI has its own folder.

It's a good idea to create a text file with the attributes that AGOL asks for. Another file to be hosted on NRCS AGOL?

I will create a dummy help file and post the URL here. Maybe we should put all the BAGIS Pro file in a designated folder in AGOL. Where did you put the streamflow .csv file?

Version control - let's not support the version control for now. However, we might need to allow multiple reports to be uploaded to the same AOI in the future. This feature could be essential for those who want to generate their customized basin analysis report.

lbross commented 3 years ago

Thank you for offering to specify the multiple page reports. We may want to consider combining the aspect map and chart, for example. Or all of the SWE maps together?

For searching, both NRCS AGOL and the hub seem to use the document title, description, and tags as search fields. They do not use the summary field. If the AOI name is part of each document title and we use the stationName from the pourpoint layer as a tag, that should work. Do you also want to add the stationTriplet from the pourpoint layer as a tag?

The AOI name in BAGIS has always been derived from the parent folder. In BAGIS V3, the date is typically appended to the AOI name to generate a unique name for the parent folder. If the AOI name (from the parent folder) is part of each document title and the stationName from the pourpoint layer is used as a tag, that would support multiple reports from the same AOI area.

We will need two different "configuration" files. One to contain the information that is the same for all .pdfs that are posted. For example: content group, sharing, credits. For the initial release, I will have this information in the Python code for simplicity, but it should be pulled out eventually in case it needs to change. I am inclined to keep it next to the batch export settings, but not in the same structure because the client shouldn't need to update it.

The second configuration file will be generated for each .pdf as it is generated by BAGIS-PRO. It will have the file name, the summary, and the tags since they are unique to the document. It will be a single document in the folder with all the .pdfs with a line for each document to be published.

Putting the BAGIS Pro config files in a designated folder in AGOL is a good idea. The streamflow .csv file is in the nwcc_nrcs folder. Let me know when you move it so I can make sure that it still works.

jdduh commented 3 years ago

Basin Analysis Report Users Manual PDF URL. The file is located in the bagis folder of the nwcc_nrcs account. https://nrcs.maps.arcgis.com/sharing/rest/content/items/b121d25cc73c4b30a700b8d2d2ea23bc/data

jdduh commented 3 years ago

Please move/put all the bagis pro system files to the bagis folder. The nwcc_nrcs is the home (root) folder.

lbross commented 3 years ago

I moved the streamflow.csv file into the new bagis folder. AGOL retained the item id and the existing code continued to work. Today I manually posted several PDF documents to the basin_analysis_reports folder. I used the specification above for the tags and descriptions. Searching for 'elwha' should locate the documents. I hope this can serve as a POC to see how the documents look and are managed by AGOL and the hub. I will post additional documents as time allows prior to your meeting next week with NWCC. It is tedious. I am grateful we will be scripting the process.

lbross commented 3 years ago

Use pourpoint stationName as-is with spaces; Compound tag with these components: Elwha at Mcdonald Bridge 12045500 WA.

lbross commented 3 years ago

@jdduh to add a new field to the master aoi list that we will use in place of the pourpoint stationName

Explore updating fields in the master aoi list when the .pdfs are uploaded. We may want to update the agol_report_url in the master aoi list with a unique url

jdduh commented 3 years ago

Please add the HUC attribute as a tag when publishing the pdf. HUC attribute values can be found in the master AOI list/feature service. I will provide additional information on the unique URLs to the pdf files. It seems the URL that users get from Hub is different from what they get from AGOL.

lbross commented 3 years ago

I'm not sure if we'll be able to automate updating the unique url in the master AOI list. Because the AOI .xlsx sources a feature service, it becomes more complicated. I'm not saying for sure, no, but this will probably be one of the later things we do. If we can figure it out.

Also I don't think the HUC attribute is valid on the current master AOI list/feature service. The value seems to be the same for multiple aois. And when I look at the snotel layer for the Dungeness AOI on the Olympic Peninsula, it doesn't match with is on the master list. It may be that Excel is treating the values as numbers instead of strings?

lbross commented 2 years ago

I have developed a button and some functions for BAGIS Pro that update all of the key fields for a collection of PDFs for an AOI to expedite the upload process until we get Python working. The algorithm works as follows:

  1. The file is initially uploaded through the online interface. The only field that is updated during upload is the title. It defaults to the .pdf file name (with underscores). I take those out and replace with spaces.
  2. Simple dialog that takes a station triplet as the argument
  3. BAGIS-P looks up the AOI in the master list and retrieves the nwcc name and the huc
  4. BAGIS-P queries the portal for all pdf file types that include the nwcc file name in the title. (Querying by file name is not an option with the API currently)
  5. BAGIS-P loops through the resulting file name and matches against a list of file names we want to update. The file names are constants in C#.
  6. If there is a match, BAGIS-P updates the metadata and permissions for that file. The only thing right now that differs between the files is the summary. I derive that from the file title I entered in step 1.

This means it takes me about 5 minutes to upload an AOI rather than 20 and it's less error prone. I updated a new version to basins if you want to try it. Because it's for me only, I didn't create any progress indicators so it may seem like it isn't working, but it is. I finished uploading the Boise AOI just now using this process.

jdduh commented 2 years ago

NWCC is considering hosting the watershed report pdf files on their own "ftp" server. This should simply the publication efforts on our end. We need to do some pilot/proof-of-concept testing. The idea is that we provide the pdf to NWCC. They will optimize the pdf files and make them available through unique URLs. They can send us a text file with the station triplet IDs and the URLs of the station reports for our use.

  1. Could you provide watershed reports generated with 300, 150, and 75 dpi for one AOI? There is no need to provide the separate pdf files. NWCC will test their PDF optimization tool to see if they can reduce the file size while maintaining the resolution.
  2. We will provide some sample pdf files to NWCC so that they test the setup of their "ftp" server. I can use the 45 files that we have on the NRCS AGOL for now.
  3. Let me know if you have preferred file format (or considerations) for the text file that they will generate. We will use this file to update the pdf URL information in our Master AOI list featureservice.
lbross commented 2 years ago

My thoughts on the aforementioned items.

  1. I will do this within the next few days. High priority item
  2. Not anything for me to do here as @jdduh will be using files that are already posted to NRCS AGOL
  3. I'm not sure how to respond to this. I don't know how to update the pdf url information in the feature service. Hopefully we can postpone this for quite some time. If it's an automated process, it will take even more thought.
lbross commented 2 years ago

item #1: There are 3 versions of the Wind_R_at_Riverton Watershed Report on the NWCC AGOL portal. The one with no dpi specified in the title is at 150 dpi. We also have 'Wind_R_at_Riverton Watershed Report 75 dpi' and 'Wind_R_at_Riverton Watershed Report 300 dpi'. Let me know if you need any additional help with this.

edit: The 300 dpi report was generated with the new runoff code but this AOI does not have a value for runoff on the .xlsx.

jdduh commented 2 years ago

Thanks!

jdduh commented 2 years ago

NWCC is able to compress the pdf without loosing map detail. The optimized 300 dpi file (24.5 MB) looks better than the 150 dpi original file (44 MB). We will generate all reports at 300 dpi.

image

lbross commented 2 years ago

Let's use the existing issue #17 to discuss aforementioned item #3: "Let me know if you have preferred file format (or considerations) for the text file that they will generate. We will use this file to update the pdf URL information in our Master AOI list featureservice."