ActoKids / AD440_W19_CloudPracticum

3 stars 1 forks source link

Mapping from website to model #46

Closed mrvirus9898 closed 5 years ago

mrvirus9898 commented 5 years ago

Now that the basic mapping is done, code will need to be produced that operates on incoming data, and formats it to an ideal JSON object. Write a script that accepts a reasonable string, and converts it to a JSON that is compliant with the API teams keys. Writing to a DB will not be necessary, yet.

Please indicate the time spent on this, any issues that you are having, any good references you found for this subject, and credit anyone helped you out.

mrvirus9898 commented 5 years ago

Estimated time: 8 hours. Current time spent: 2 hours.

I spent time looking into how Python handles Regex and if there are any packages that allow me to easily implement a mapping scheme.

mrvirus9898 commented 5 years ago

Estimated time: 8 hours. Current time spent: 12 hours.

Who needs Regex and such when python dictionaries exist! However, I did run into an issue getting beautifulsoup to install correctly into python, which is where I burned most of my time. I figured out a work around, and used Mikes crawler to pull a JSON of data from eventbrite. I saved this JSON so that I could work with it without spamming eventbrite.

This proposed mapping scheme contains a dictionary of key-value pairs, where the key is the old key, and the value is the new key that our API team will be expecting. Mapping protocol checks the old key and returns the new, proper key if found.

To test this code, download both the JSON file and the Py file. Make sure to run the Py file in the same folder as the JSON file.

Link to pull request: https://github.com/ActoKids/web-crawler/pull/4

toddysm commented 5 years ago

We need some overview page on the Wiki explaining where this mapping is happening, is it for single web site or every web site etc. Getting starting point for the implementation.

Also, any testing done, any bugs resolved or filed?

mrvirus9898 commented 5 years ago

https://github.com/ActoKids/web-crawler/wiki/Mappings

The mappings file is to be edited and updated as needed.