bcavallo / Jazz-Schedule-Website

website that lists upcoming NYC jazz events in a cohesive way and includes integration with a personal calendar api. 50/50 chance of completion!
1 stars 2 forks source link

Scraping #1

Open bennn opened 8 years ago

bennn commented 8 years ago

This website needs data! Questions:

Q1. what sites do you want to scrape? Q2. what info do you want?

(If you answer Q1 I can try answering Q2 here, posting some ideas for database models)

bcavallo commented 8 years ago

Hey man -

Here are some links to start.

So I I'm interested in schedules at:

http://www.55bar.com/ (Done-ish for scraping) https://www.smallslive.com/events/calendar/ https://www.mezzrow.com/

to start. I haven't had a lot of time to put work in, but I should be wrapping up with the more intense work for this whole work project thing pretty soon.

On Thu, May 12, 2016 at 2:59 PM, Benjamin Greenman <notifications@github.com

wrote:

This website needs data! Questions:

Q1. what sites do you want to scrape? Q2. what info do you want?

(If you answer Q1 I can try answering Q2 here, posting some ideas for database models)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/bcavallo/Jazz-Schedule-Website/issues/1

bennn commented 8 years ago

swag 🐘

bennn commented 8 years ago

oh where's the scraping code for 55bar?

bcavallo commented 8 years ago

The function fifetyfive_get_shows (sp?) in app.py is where I'm putting it right now. 55 bar has the worst designed website I've ever seen which is why it looks so so terrible. There might be a better way, but this function is what I have now. It returns a list or something that you send to the template which then renders it on the page.

Ty by the way for helping. I should have more time soon, and then I'll put in some good hours.

BC

On Thu, May 12, 2016 at 4:23 PM, Benjamin Greenman <notifications@github.com

wrote:

oh where's the scraping code for 55bar?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/bcavallo/Jazz-Schedule-Website/issues/1#issuecomment-218874680

bcavallo commented 8 years ago

So I'm going to try to follow up using your method because no question it's cleaner and more professional looking. Also - I haven't done any python 3 yet, so that should be interesting to learn. I understand most of it, but I need to go through some of the details of scrapy to understand more fully. It'd also be great to chat a little about all of it when you get the chance just so I see your logic a bit better as I go through future projects.

But so far it seems good and I'll build scrapers for some additional sites I think.

BC

bennn commented 8 years ago

Do you want to post by-line comments on the diff here: https://github.com/bcavallo/Jazz-Schedule-Website/pull/2/files

Scrapy was a Python 2.7 library first. I think the only differences here are print() and system calls returning bytes (instead of strings).