OpenBeta / openbeta-graphql

The open source rock climbing API
https://openbeta.io
GNU Affero General Public License v3.0
41 stars 32 forks source link

feat(api): add /import endpoint to support bulk importing json data #390

Closed Silthus closed 9 months ago

Silthus commented 9 months ago

Hey Openbeta Team :)

As discussed in https://github.com/OpenBeta/open-tacos/issues/1031 it would be great to have a simple page where users can bulk import data using a simple, validated json schema.

This PR adds the required functionality to the backend, enabling the integration of a bulk import in the frontend. The frontend is also actively being worked on by @l4u532.

What this PR does:

Let me know what you think. I will happily make any adjustments to the PR.

Silthus commented 9 months ago

Thanks @vnugent for the feedback on Discord about discussing the pros and cons of using the existing GraphQL API vs. creating a new REST endpoint first.
For documentation purposes, I will continue the discussion here.


First off: you are absolutely right to discuss the pros and cons before going forward with such a change. You built such an amazing product and we are very happy to hear your thoughts and feedback about the change we are about to implement.
Frankly, we were so deep into coding, that we forgot to share our train of thought. So here is the pro and con list we went through before deciding to go with the extra /import endpoint. We are eager to hear your thoughts about it and more than willing to change things if you disagree or have concerns.

REST vs GraphQL for Bulk-Import of Data

[Draft] Example import.json ```json { "areas": [ { "areaName": "Utah", "countryCode": "us", "children": [ { "areaName": "Southeast Utah", "children": [ { "areaName": "Indian Creek", "children": [ { "areaName": "Supercrack Buttress", "climbs": [ { "name": "The Key Flake", "grade": "5.10", "fa": "unknown", "type": { "trad": true }, "safety": "UNSPECIFIED", "metadata": { "lnglat": { "type": "Point", "coordinates": [ -109.54552, 38.03635 ] }, "left_right_index": 1 }, "content": { "description": "Cool off-width that requires off-width and face skills.", "protection": "Anchors hidden up top. Need 80m to make it all the way down.", "location": "Opposite keyhole flake. Obvious right leaning offwidth that starts atop 20 ft boulder." } }, { "name": "Incredible Hand Crack", "grade": "5.10", "fa": "Rich Perch, John Bragg, Doug Snively, and Anne Tarver, 1978", "type": { "trad": true }, "safety": "UNSPECIFIED", "metadata": { "lnglat": { "type": "Point", "coordinates": [ -109.54552, 38.03635 ] }, "left_right_index": 2 }, "content": { "description": "Route starts at the top of the trail from the parking lot to Supercrack Buttress.", "protection": "Cams from 2-2.5\". Heavy on 2.5\" (#2 Camalot)", "location": "" }, "pitches": [ { "pitchNumber": 1, "grade": "5.10", "disciplines": { "trad": true }, "length": 100, "boltsCount": 0, "description": "A classic hand crack that widens slightly towards the top. Requires a range of cam sizes. Sustained and excellent quality." }, { "pitchNumber": 2, "grade": "5.9", "disciplines": { "trad": true }, "length": 30, "boltsCount": 0, "description": "Easier climbing with good protection. Features a mix of crack sizes. Shorter than the first pitch but equally enjoyable." } ] } ], "gradeContext": "US", "metadata": { "isBoulder": false, "isDestination": false, "leaf": true, "lnglat": { "type": "Point", "coordinates": [ -109.54552, 38.03635 ] }, "bbox": [ -109.54609091005857, 38.03590033981814, -109.54494908994141, 38.03679966018186 ] }, "content": { "description": "" } } ], "metadata": { "lnglat": { "type": "Point", "coordinates": [ -109.5724044642857, 38.069429035714286 ] } }, "content": { "description": "Indian Creek is a crack climbing mecca in the southeastern region of Utah, USA. Located within the [Bears Ears National Monument](https://en.wikipedia.org/wiki/Bears_Ears_National_Monument)." } } ] } ] } ] } ```

Bulk Importing on the Server Side Using a New REST Endpoint

This is done using a separate endpoint, swapping out the apollo-server package against the apollo-server-express package (see here for more details).

Pro

Cons

Importing via GQL

This would be done by processing the json document on the client constructing a GraphQL query and sending it to the server.

Pro

Cons

Conclusion

We believe that doing the little bit of extra work and implementing a custom endpoint for importing bulk data is the right way to go.

The main reason for that is the transaction safety ensuring that a single .json file can be imported, using multiple retries to fix possible errors, without breaking the import and leaving the database in a partial state.

The second biggest reason we prefer the server side import is the additional complexity it brings to changing the GraphQL endpoints. By separating it from the regular API, a new contributor or existing collaborator does not need to touch the frontend of the bulk importer when changing the GraphQL API. Also we don't really know how it would behave with large amounts of data.

Plans for the Future

Here are some additional ideas and thoughts we had regarding the import feature, which also influenced our decision:

Export -> Edit -> Import Workflow

The idea is to add an export function that allows exporting selected areas or regions with individual properties into a single json file which then can be edited and uploaded using the import page.
This would allow for a very easy way to update existing data. The export would be similar to the existing openbeta-export, except that it produces a single json file in the schema required by the import.

Data Cleanup

One issue with Openbeta is the quality of data. Adding such an export and import workflow could allow individual, non developers, to start cleaning up data in a larger scope.

vnugent commented 9 months ago

@Silthus I'd be more comfortable if we still use GQL as the interface (not introducing REST in this PR). You can create a new GQL bulk mutation and reuse the all DB operations in import-json.ts. There's no need to call existing update APIs.

Silthus commented 9 months ago

OK, I can understand why you are more comfortable with sticking to the GraphQL only API.

But I am not quite sure what you mean. Can you share a small example snippet or link to some docs? I just need to direction to head towards and will change the bulk import accordingly.

vnugent commented 9 months ago

You can piggyback Area GQL. I'm using updateAreaSortingOrder mutation as an example.

  1. Add a new mutation definition to AreaEdit.gql, ex bulkUpload or whatever.
    • You'll need to explicitly define the input param. This is the schema/structure of the bulk json object. This helps with param validation and provides schema for GQL explorers.
    • Optionally, you can also create a Custom scalar to encapsulate the bulk upload payload. Example of existing custom scalar
  2. Add a GQL mutation mapping to JS model (analogous to REST 'controller')
Silthus commented 9 months ago

@vnugent thanks that helped a lot!

I adjusted the code accordingly, the only thing left to do are some failing tests where the input doesn't match up.

Let me know what you think of the changes.

I intentionally left the exchange of the apollo-server-express package in there to decouple the tests from a real running server listening on a port on localhost. This significantly speeds up testing and reduces complexity without any disadvantages.

Silthus commented 9 months ago

I have a small request to check both nullness and defined. Other than that we're good to merge.

Thanks I learned something again today :)