doccano / doccano

Open source annotation tool for machine learning practitioners.
MIT License
9.51k stars 1.72k forks source link

[Documentation] Add information about the possibility to import metadata in dataset #2041

Open Dynnammo opened 1 year ago

Dynnammo commented 1 year ago

:warning: Disclaimer : newbie on this project as a contributor (however I'm a 2-years users of Doccano, you're product is 🔥). This issue is about documentation and I know you recommend making a PR first, but since my time is a bit constrained I first would like to have insights about the need of this :warning:

Context

As an a analyst, I work a lot on text classifications that are extracted from an external platform. These text chunks have each an ID that I must have after annotation, which I could handle thanks to the metadata field when I import data.

However, as I dive through the documentation (https://doccano.github.io/doccano/tutorial/) on the section of importing dataset, I find not trace of such possibility, which can be quite useful.

I would like to add a section in the import a Dataset section that gives insight about how the file can handle metadata, and more widely informations that are shown nearby annotations. Am I suggesting something worth ?

System information

github-actions[bot] commented 1 year ago

Would you write your environment? Thank you!

david-engelmann commented 1 year ago

If you upload the inputs through a json based format (JSON/L), if the key in the json doesn't match the "Column Label" text, it will be uploaded as Metadata

Ie. {"text": "blah", "label": "blah", "your_thing": "blah"}

Dynnammo commented 1 year ago

If you upload the inputs through a json based format (JSON/L), if the key in the json doesn't match the "Column Label" text, it will be uploaded as Metadata

Ie.

{"text": "blah", "label": "blah", "your_thing": "blah"}

Yes, I got it. What I was asking for is precisely if it would be relevant to add your answer in the documentation. Currently (or maybe I read it wrong), the Import dataset section is quite scarce about this .

image

david-engelmann commented 1 year ago

@Dynnammo We could add in some documentation for it. I'd also like to see the relation labels supported so we could document that as well!

Dynnammo commented 1 year ago

I don't think I have work with what you call relation labels yet, but I'm ready to learn more about it ^^. You can assign me on this issue, I'll try to find some time to work on it by the end of the week :+1:

Hironsan commented 1 year ago

Hello, @Dynnammo

Great suggestion. I'm looking forward to your work.

david-engelmann commented 1 year ago

@Dynnammo I have an open issue about supporting relation labels that should point you in the right direction. -> https://github.com/doccano/doccano/issues/2039. If there is anything I can help with, please let me know!