CodeforNepal / akshara-project

Sangraha is an initiative to promote the usage of native langauge in computing and literature
https://www.sangraha.org/
GNU General Public License v3.0
16 stars 7 forks source link

ETL (Extract Transform Load) #3

Closed tux4 closed 6 years ago

tux4 commented 6 years ago

For the MVP we intend to keep things simple with just a frontend over elastic search, but we don't want to potentially pollute elastic search when we import new stuff. We will first store all the disaparate data sources in (SQL or something else?) and make an ETL to Elastic Search in Python.

Outcomes

  1. Ingestion

    • We will have a command line script that will let anyone to injest data in our system
    • Well defined and documented workflow for adding new documents in the elastic search
  2. Content

    • Scripts for various sites from our sources to crawl and index data in Elastic search