christabor / plantstuff

Warning! messy/unstable! :herb: :evergreen_tree: :maple_leaf: :leaves: :hibiscus: Utilities for retrieving, computing, organizing, and creating plant/horticulture data from various sources.
MIT License
6 stars 0 forks source link

Document scraping and schema assembly process and requirements #10

Open christabor opened 6 years ago

christabor commented 6 years ago

Leveraging scraped data is vital to the Project, but it poses many problems:

We need to document this in detail so we can define clear paths to mitigate or address each one.

For example, when gathering scraped data, source a may contain one set of data, while source b contains a subset, or a different set entirely.

In this scenario, problems emerge:

And also the possible need to provide attribution so that data trust can be amazed at a later point in time.