gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
5 stars 1 forks source link

Bulk editing #55

Open tucotuco opened 4 years ago

tucotuco commented 4 years ago

I can easily imagine that, especially but not only in the early stage of thesuarus building, contributors would benefit from being able to grab the whole vocabulary, work on it in bulk, say in OpenRefine, and then submit the results as an update.

tucotuco commented 3 years ago

We have a situation now where the bird collection community would like to do exactly this with the terms lifeStage, sex, and preparations. We'll have a call with them 2021-01-25 and try to help direct their efforts to have the best impact and show them what has already been done with lifeStage.

ManonGros commented 3 years ago

thanks @tucotuco, do you mean have the possibility to upload a bulk of (alternative and/or hidden) labels for a given concept or perhaps several concepts?

I also think that would be useful as most of the processing for verbatim terms is done in other programs. That being said, I am not sure what would be the best way to do that.

Is this what you had in mind?

@marcos-lg @thomasstjerne any thought on the topic?

tucotuco commented 3 years ago

@ManonGros I expect this community will want to make contributions at nearly every level of the vocabulary - concepts, labels in multiple languages, and hidden labels. I have extracted the distinct values of All GBIF Bird Preserved Specimens for four terms - sex, lifeStage, preparations, and reproductiveCondition. I also chopped those up into clauses and words to ease the workload. We introduced the lifeStage vocabulary as it stands now and they are keen to begin with that and to have a framework to work within. We're trying to figure out from here what is the best approach for them to work on all of this in their workshop and beyond. Whatever comes out of it we can manipulate for bulk incorporation afterward, but it would be good to start thinking about what that should look like.

marcos-lg commented 3 years ago

Currently we can export the vocabularies in JSON format at this endpoint:

http://api.gbif.org/v1/vocabularies/LifeStage/concepts/export

It's in JSON because that's the easiest format for the API to understand but it's probably not the best one to do bulk editing. However, we could use it as a starting point and turn it into something else like CSV or other format that is easy to use in spreadsheets and then we can import those files in the API.

Any format is fine to me, I'm happy to use any format that the editors feel comfortable with. The most important is probably to use the same languages that the API supports - we have an enum with those values. We could also have one language as a default if that makes things simpler. About the other fields, we can ignore the keys or the audit ones (createdBy, created, etc.) and there is no need to fill all the fields out. Then, when we import these files the endpoint can respond with the concepts that were not imported because of conflicts with other concepts.

tucotuco commented 3 years ago

Sounds good. The JSON is indeed useful, and online JSON to CSV converters can turn that into a form easily usable in a spreadsheet. That's what I did to show the NAOC group the structure and content of the liefStage vocab in a form that they might use for editing.

On Wed, Jan 27, 2021 at 9:10 AM Marcos Lopez Gonzalez < notifications@github.com> wrote:

Currently we can export the vocabularies in JSON format at this endpoint:

http://api.gbif.org/v1/vocabularies/LifeStage/concepts/export

It's in JSON because that's the easiest format for the API to understand but it's probably not the best one to do bulk editing. However, we could use it as a starting point and turn it into something else like CSV or other format that is easy to use in spreadsheets and then we can import those files in the API.

Any format is fine to me, I'm happy to use any format that the editors feel comfortable with. The most important is probably to use the same languages that the API supports - we have an enum https://github.com/gbif/vocabulary/blob/master/model/src/main/java/org/gbif/vocabulary/model/enums/LanguageRegion.java with those values. We could also have one language as a default if that makes things simpler. About the other fields, we can ignore the keys or the audit ones (createdBy, created, etc.) and there is no need to fill all the fields out. Then, when we import these files the endpoint can respond with the concepts that were not imported because of conflicts with other concepts.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gbif/vocabulary/issues/55#issuecomment-768243202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ72Y4VWFUKAEQLWB5QTDS377BPANCNFSM4P7RPXYA .

timrobertson100 commented 3 years ago

Having now created an 11 concept vocabulary in English, I can confirm that the UI as it currently stands is clunky, with a lot of selecting of language for each entry and tedious clicking around.

Tweaking a few edits is fine, but I agree we need a more efficient way to bulk edit. The workflow I think would work best would be that something along the lines of:

Doing it as a set of 4-sheets per language makes the workflow and spreadsheets easier, and one presumes people would mostly work on a single language, or two languages at a time.

tucotuco commented 3 years ago

This seems practical to me.

On Thu, Mar 25, 2021 at 5:30 AM Tim Robertson @.***> wrote:

Having now created an 11 concept vocabulary in English, I can confirm that the UI as it currently stands is clunky, with a lot of selecting of language for each entry and tedious clicking around.

Tweaking a few edits is fine, but I agree we need a more efficient way to bulk edit. The workflow I think would work best would be that something along the lines of:

  • User chooses to export for bulk editing choosing which languages they work on (e.g. English, or English+spanish)
  • User downloads a package that contains 4 spreadsheets for each language:
    1. The vocabulary definition (title, definition, URI etc)
    2. A sheet with a single row for each concept where cells with multiple values are omitted
    3. A sheet for alternative labels containing only concept -> alternative label (one per row)
    4. A sheet for hidden labels containing only concept -> hidden label (one per row)
  • User can then upload this change-set for the languages included, which will either insert or update existing entries.
  • Deletions of concepts would need to be done in the tool itself

Doing it as a set of 4-sheets per language makes the workflow and spreadsheets easier, and one presumes people would mostly work on a single language, or two languages at a time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gbif/vocabulary/issues/55#issuecomment-806463324, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ722EWHFWLCAQWMZLCG3TFLYCVANCNFSM4P7RPXYA .

marcos-lg commented 3 years ago

I was talking with @timrobertson100 last week about this and we were thinking that it's probably worth to explore the option of doing the bulk editing in the UI, since it can be more friendly to use and the changes would be saved immediately (as in google sheets).

As a first suggestion, I was thinking that we can split the concepts fields in multiple tables so they are small and easier to handle. We could use tabs to switch between tables. The tables would be editable (something like this https://ant.design/components/table/#components-table-demo-edit-cell).

I did a very simple sketch to show how we could organize the tables:

sketch_bulk_editing

If definitions become large we could use another tab menu for languages as in the alternative labels.

This could be an "edit view" only for the purpose of bulk editing.

tucotuco commented 3 years ago

Maybe I misunderstand, but for bulk editing I imagine doing a lot quickly. This proposal doesn't look like it satisfies the quickly part because of all the selections that have to be done even for one concept. This looks like a detail editing mode, not a bulk editing mode. As detail editing interface I think it's great.

On Mon, Jul 19, 2021 at 8:33 AM Marcos Lopez Gonzalez < @.***> wrote:

I was talking with @timrobertson100 https://github.com/timrobertson100 last week about this and we were thinking that it's probably worth to explore the option of doing the bulk editing in the UI, since it can be more friendly to use and the changes would be saved immediately (as in google sheets).

As a first suggestion, I was thinking that we can split the concepts fields in multiple tables so they are small and easier to handle. We could use tabs to switch between tables. The tables would be editable (something like this https://ant.design/components/table/#components-table-demo-edit-cell).

I did a very simple sketch to show how we could organize the tables:

[image: sketch_bulk_editing] https://user-images.githubusercontent.com/25691745/126150223-a66eea7d-6691-48d1-a8c4-14939d5c61c7.png

If definitions become large we could use another tab menu for languages as in the alternative labels.

This could be an "edit view" only for the purpose of bulk editing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gbif/vocabulary/issues/55#issuecomment-882475991, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ722YHDQW4VDOO623HODTYQEQBANCNFSM4P7RPXYA .

CecSve commented 6 months ago

We now have the option to bulk edit hidden values by importing them separately: https://github.com/gbif/vocabulary/tree/dev/vocabulary-importer#import-of-hidden-labels - does this satisfy the need @tucotuco or is there more to it?

I have created another issue for possibility to support more languages: https://github.com/gbif/vocabulary/issues/129.

tucotuco commented 6 months ago

@CecSve Yes, that looks like a perfectly satisfactory bulk editing solution.