GSA / sdg-indicators-usa

U.S. National Reporting Platform for the Sustainable Development Goals
https://sdg.data.gov
MIT License
32 stars 92 forks source link

Guidance on bulk-updating indicator data from CSV files #830

Open brockfanning opened 6 years ago

brockfanning commented 6 years ago

Hi all, below is a draft of some guidance for bulk-uploading CSV data. Feedback is welcome!

Bulk-updating indicator data from CSV files

Overview

These instructions are intended for any NRP data provider that would like to maintain their NRP data as local CSV files, which they periodically bulk-upload to the repository. The basic steps involved in that workflow are detailed below.

Download the current files

In order to bulk-update indicator data, first you will need to download the entire repository, so that you can edit the included CSV data files using your own software (such as Excel, etc.).

  1. In a browser go to your repository on Github.com. Taking the U.S. platform as an example, the URL is: https://github.com/GSA/sdg-indicators
  2. Expand the “Clone or download” button
  3. Choose “Download zip”
  4. Unzip the downloaded file to a location of your choosing on your own computer

Edit the files as needed

Expand the downloaded zip file, and expand the ‘data’ folder to find the relevant CSVs.

The details of this step will vary depending on what software you plan to use to edit the files. However, here are some important guidelines:

  1. Make sure that you do not accidentally change any of the column headers. In some cases, these headers are meaningful and could be connected to platform functionality. Your edits should be restricted to the numeric data itself.
  2. When finished editing, make sure that the data is saved once more as a CSV file, of the same name. For example, if using Excel:
    1. Chose the “Save as” option
    2. In the “Save as type” box, choose “CSV (comma delimited)”
    3. Do not change the name of the file
    4. Click OK/Yes for any dialogs that appear
  3. It is recommended that you save all the updated CSV files in a single folder

Bulk-upload the new files to Github.com

Once you are ready to upload the revised files, and “commit” the changes back to Github.com, follow these instructions.

  1. Once again, in a browser go to the repository on Github.com, as above.
  2. This time, you need to get to your “fork”. (“Fork” is a Github term used to refer to a copy of a repository.) The easiest way is to choose the “Fork” button at the top right. You should see “You already have a fork of this repository” followed by a link to your fork. Follow that link to go to your fork. (You can also bookmark this location to simplify this step in the future.)
  3. Once on your fork, scroll down in the list of files/folders, and choose the “data” folder.
  4. In the upper right, choose the “Upload files” button.
  5. Choose or drag/drop the CSV files you saved earlier. Multiple files can be chosen or dragged/dropped in the usual ways, such as by SHIFT-clicking to select many files.
  6. At the bottom, briefly explain the change you have made in the ‘Commit changes’ area. Select “Create a new branch for this commit and start a pull request.” This simplifies the process of bringing your changes to the attention of reviewers.
  7. Click “Propose changes”. Then, on the next page, choose “Create pull request”. This completes the process.

The reviewers will then receive the request and approve/comment as appropriate.

JenPark9 commented 6 years ago

Thank you, Brock. Should we understand that this method could be used to bulk upload files from other data systems into the github environment--such as requested by Ecuador and others? If so, might we suggest (through OAS--Ceceila perhaps) that a country test the instructions? If you agree, maybe Angela can connect with them?

brockfanning commented 6 years ago

@JenPark9 These instructions are for bulk-uploading CSV files from the user's local computer. If the other data systems can export their data to CSV files, and the column headers are correct, then it's possible. But it would be 2 separate steps:

  1. Export the data from the other data system as CSV files, and tweak the column headers as needed
  2. Use these instructions to bulk-upload those files into Github.com.

Also, in these instructions, I included some steps referring to the downloading of the current CSV files from Github. That wouldn't be relevant if a country always updates their data in another system. If you'd like I can tweak these instructions to accommodate the possibility of a country managing their data in a separate system, and then only using Github to upload as a last step.

JenPark9 commented 6 years ago

Great. I am flagging for Angela and Stephanie so that they can follow up with potential use cases (either via OAS or others).

From: brockfanning [mailto:notifications@github.com] Sent: Friday, December 8, 2017 11:36 AM To: GSA/sdg-indicators sdg-indicators@noreply.github.com Cc: Park, Jennifer E. EOP/OMB Jennifer_E_Park@omb.eop.gov; Mention mention@noreply.github.com Subject: [EXTERNAL] Re: [GSA/sdg-indicators] Guidance on bulk-updating indicator data from CSV files (#830)

@JenPark9<%3ehttps:/github.com/jenpark9%3c> These instructions are for bulk-uploading CSV files from the user's local computer. If the other data systems can export their data to CSV files, and the column headers are correct, then it's possible. But it would be 2 separate steps:

  1. Export the data from the other data system as CSV files, and tweak the column headers as needed
  2. Use these instructions to bulk-upload those files into Github.com.

Also, in these instructions, I included some steps referring to the downloading of the current CSV files from Github. That wouldn't be relevant if a country always updates their data in another system. If you'd like I can tweak these instructions to accommodate the possibility of a country managing their data in a separate system, and then only using Github to upload as a last step.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<%3ehttps:/github.com/GSA/sdg-indicators/issues/830%3c#issuecomment-350308808>, or mute the thread<%3ehttps:/github.com/notifications/unsubscribe-auth/ATtsqo6nJEO-4468BeuskCoYy0qI8wf9ks5s-WV3gaJpZM4Qf2YC%3c>.

brockfanning commented 6 years ago

Here's an updated version that's also geared towards countries who might be managing their data separately from Github, and only uploading it to Github as a final step.

Bulk-updating indicator data from CSV files

Overview

These instructions are intended for any NRP data provider that would like to maintain their NRP data outside of Github, and then periodically bulk-upload the data as CSV files to Github. The basic steps involved in that workflow are detailed below.

Get the data from Github

If the data is already in Github, but needs to be migrated out in order to be maintained elsewhere, then this one-time task will be necessary: getting the data from Github. If the data is already being maintained outside of Github, this can be skipped.

IMPORTANT: However, if there is a chance that other data providers may be editing the data independently, then this step cannot be skipped.

  1. In a browser go to your repository on Github.com. Taking the U.S. platform as an example, the URL is: https://github.com/GSA/sdg-indicators
  2. Expand the “Clone or download” button
  3. Choose “Download zip”
  4. Unzip the downloaded file to a location of your choosing on your own computer
  5. Expand the downloaded zip file, and expand the ‘data’ folder to find the relevant CSVs.

Maintain the data in whatever way you chose

The specifics of this step will vary greatly depending on what mechanism is used to maintain the data. Each data provider, if they chose to maintain their data outside of Github, will chose a data management solution. Some examples might be:

Regardless, here is an important rule of thumb to keep in mind:

  1. Make sure that you do not accidentally change any of the column headers. In some cases, these headers are meaningful and could be connected to platform functionality. Your edits should be restricted to the numeric data itself.

Export the data to CSV files

The details of this step will vary depending on what software/system you plan to use to maintain the data. However, here are some important guidelines:

  1. When finished editing, make sure that the data is saved once more as a CSV file, of the same name. For example, if using Excel:
    1. Chose the “Save as” option
    2. In the “Save as type” box, choose “CSV (comma delimited)”
    3. Do not change the name of the file
    4. Click OK/Yes for any dialogs that appear
  2. It is recommended that you save all the updated CSV files in a single folder

Bulk-upload the new files to Github.com

Once you are ready to upload the revised files, and “commit” the changes back to Github.com, follow these instructions.

  1. Once again, in a browser go to the repository on Github.com, as above.
  2. This time, you need to get to your “fork”. (“Fork” is a Github term used to refer to a copy of a repository.) The easiest way is to choose the “Fork” button at the top right. You should see “You already have a fork of this repository” followed by a link to your fork. Follow that link to go to your fork. (You can also bookmark this location to simplify this step in the future.)
  3. Once on your fork, scroll down in the list of files/folders, and choose the “data” folder.
  4. In the upper right, choose the “Upload files” button.
  5. Choose or drag/drop the CSV files you saved earlier. Multiple files can be chosen or dragged/dropped in the usual ways, such as by SHIFT-clicking to select many files.
  6. At the bottom, briefly explain the change you have made in the ‘Commit changes’ area. Select “Create a new branch for this commit and start a pull request.” This simplifies the process of bringing your changes to the attention of reviewers.
  7. Click “Propose changes”. Then, on the next page, choose “Create pull request”. This completes the process.

The reviewers will then receive the request and approve/comment as appropriate.

brockfanning commented 6 years ago

@SmithersA just calling your attention to this one. See above for notes from Jen about possibly providing use-cases from the perspective of Ecuador or others.

SmithersA commented 6 years ago

@brockfanning ~ what I recall (if I'm not mistaken) they wanted to link to the database information that existed on their system without having the burden of EXPORTING the information into an spreadsheet, then converting it into a CSV, and then uploading. Can they just link to a spreadsheet (wherever it may be stored)?

brockfanning commented 6 years ago

@SmithersA Gotcha. I think this falls more along the lines of the "Database Linking" item in the development priorities. I've started #859 to explore that more.