ActoKids / AD440_W19_CloudPracticum

3 stars 1 forks source link

Google Calendar crawler: round one #42

Open mrvirus9898 opened 5 years ago

mrvirus9898 commented 5 years ago

It is time to crawl through some real data. A URL will be provided. Data from this crawler does not need to be mapped or formatted, only saved to DynamoDB via JSON object. Don't be afraid to pollute DynamoDB right now; quality assurance will come later.

Please indicate the time spent on this, any issues that you are having, any good references you found for this subject, and credit anyone helped you out.

Target URL http://www.seattleadaptivesports.org/calendar.html

rberry206 commented 5 years ago

I worked for a little bit on the parsing. I figured out that you can add Google Calendars together to consolidate them, so I made a dummy gmail and added the example calendar to it. I still need to figure out how to pull unstructured data, will work a few hours this week.

rberry206 commented 5 years ago

LOOK AT COMMENT ABOVE

Estimated time: 6 hours Total time spent so far: 1 hour

Code hasn't changed, just the calendar I'm calling

mrvirus9898 commented 5 years ago

I see, so we could have an admin calendar that acts as a pool for this data? The would really simplify our external API calls.

rberry206 commented 5 years ago

Yeah it's pretty nice. I need to test overlapping events but they should work fine.

rberry206 commented 5 years ago

I got more info from the calendar and learned that if you put all of the calendars into one place the email associated with the event will change to your personal email. I'm still looking for the workaround. If we can't work around that we may need multiple calls to multiple calendars, or just not associate an email with calendar events.

Estimated Time: 6 hours Total time spent: 4 hours

https://github.com/ActoKids/web-crawler/wiki/Google-Calendar-Crawler

https://github.com/ActoKids/web-crawler/pull/8

toddysm commented 5 years ago

Need some more comments in the code Who tested it and what is her/his feedback? What did you test?

rberry206 commented 5 years ago

https://github.com/ActoKids/web-crawler/pull/11

I added comments. Michael found the same results I did and he merged it to the dev branch.