USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Sparkler Elasticsearch storage engine #209

Closed lewismc closed 3 years ago

lewismc commented 3 years ago

We are moving ahead with this project within the USC CSCI 401 Senior Capstone Program. My stakeholder needs are as follows

1. As a Crawler Administrator, I need to be able to specify and configure the Sparkler storage engine through configuration parameters.
1. As a Data Analyst, I need access to a dashboard which displays crawl information.
1. As a DevOps Engineer, I need to deploy Sparkler (configured with the Elasticsearch storage engine) via Docker and as a Helm Chart.
1. As a Test and Quality Assurance Engineer, I need to integrate Sparkler tests for the Elasticsearch storage engine into my CI process.
1. As a Development Lead, I need access to developer documentation covering the Elasticsearch storage engine for Sparkler.

Sparkler Committers, I wonder if it is possible for us to use the Github projects feature to manage the project?

Capstone Team, please reply with your Github ID here.

Kefaun2601 commented 3 years ago

Github ID: 42557579

slhsxcmy commented 3 years ago

login: slhsxcmy id: 20136533

nhandyal commented 3 years ago

uid: nhandyal

felixloesing commented 3 years ago

My Github ID is felixloesing

KilometersFan commented 3 years ago

ID: 35278719, Username: KilometersFan

thammegowda commented 3 years ago

Sparkler Committers , I wonder if it is possible for us to use the Github projects feature to manage the project?

Yes. Of course. @lewismc Let me know what permissions etc you need. FYI I already sent an invitation to your account.

lewismc commented 3 years ago

@thammegowda please add all folks mentioned in this thread as contributors. They will need to use the Project's function to manage the ongoing project.

lewismc commented 3 years ago

@thammegowda once don, please close this issue. THANK YOU very much @thammegowda

thammegowda commented 3 years ago
  1. I created a team @USCDataScience/csci401-2021spring
  2. Added all the above 5 members to that team
  3. Everybody in the team is granted Triage permission which allows them to manage issues and pull requests.
  4. I think their best course of action is 1. clone this repo to their own space, 2. modify, 3. send pull request so we can review and merge. We shall give write permissions over time as they make progress and prove to be fit (similar to the ASF ).
lewismc commented 3 years ago

Thank you @thammegowda