data-lessons / library-webscraping-DEPRECATED

Webscraping lesson for librarians NOW MOVED > https://github.com/LibraryCarpentry/lc-webscraping
https://github.com/LibraryCarpentry/lc-webscraping
Other
4 stars 12 forks source link

Introspection after ResBaz Sydney 2017 lesson #41

Closed jnothman closed 6 years ago

jnothman commented 7 years ago

This afternoon, I had 3h (including 10 min break) to present web scraping. I presented from https://ctds-usyd.github.io/2017-07-03-resbaz-webscraping/. I am not a trained SWC instructor, and not used to the narrative format of SWC lessons. I am also an experienced software engineer, so while I am used to some amount of teaching, it was hard for me to recall how much ground work there is to this topic. In the context of ResBaz, I was presenting to a group of research students, librarians, ?academics, etc. from Sydney universities. I did not get anything in the way of a survey, but hope to ask the ResBaz organisers to email students for their comments.

There were about 22 students, though 40 had signed up. Despite the Library Carpentry resolutions of a few weeks ago to focus on coding scrapers, I had decided to make something accessible to non-coders. In the end, we did not cover the coding part at all. I don't think we suffered greatly for this.

What we managed to cover

We covered, perhaps, half the material:

Good points

Things deserving attention

Overall

CSS selectors

Visual scraping


I'll offer my lessons across to this repo shortly.

Anything to add, @nikzadb, @anushi, @RichardPBerry?

ostephens commented 7 years ago

Thanks so much for this write up @jnothman

It feels like there could be room for different web scraping lessons here - an 'intro to web scraping with tools' - focus on a tool, include introduction to HTML/CSS; and a more advanced lesson - possibly 'web scraping with Python'.

I could see this being multiple episodes within a single lesson - but it would have to be clear that the intention wasn't to use all the episodes in one teaching session. (@drjwbaker has suggested a similar approach in the OpenRefine lesson to me previously)

I feel that any tool introduced should follow the selectors we are teaching - so if we are teaching css selectors it seems odd then to use a tool that uses similar but different selector syntax.

Was there any feedback from the participants in terms of how useful they found it and whether it met with their expectations?

jnothman commented 7 years ago

yes, having some optional components makes sense, but in any case a lesson will be cropped to fit its schedule and audience when presenting it. One question is whether there are alternatives (hard to maintain) or extensions (still has its challenges).

I did not collect feedback in an organised manner but hope to get a list of participant emails to ask for feedback after the fact. (In focusing on developing the lesson I didn't prepare enough for that aspect.) I had one strong "I'm struggling" response between CSS selectors and visual scraping. I had the sense that most other people were following along well and asking appropriate questions about the exercises, and I got a couple of positive comments.

On 4 Jul 2017 6:25 pm, "Owen Stephens" notifications@github.com wrote:

Thanks so much for this write up @jnothman https://github.com/jnothman

It feels like there could be room for different web scraping lessons here

  • an 'intro to web scraping with tools' - focus on a tool, include introduction to HTML/CSS; and a more advanced lesson - possibly 'web scraping with Python'.

I could see this being multiple episodes within a single lesson - but it would have to be clear that the intention wasn't to use all the episodes in one teaching session. (@drjwbaker https://github.com/drjwbaker has suggested a similar approach in the OpenRefine lesson to me previously)

I feel that any tool introduced should follow the selectors we are teaching - so if we are teaching css selectors it seems odd then to use a tool that uses similar but different selector syntax.

Was there any feedback from the participants in terms of how useful they found it and whether it met with their expectations?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/data-lessons/library-webscraping/issues/41#issuecomment-312814341, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz6wzmOgIGtSPhOGcUvTK-MUl8NLiwks5sKfb9gaJpZM4OMUdH .

RichardPBerry commented 7 years ago

Great workshop, and great summary @jnothman, I think you picked out all the keypoints. I will add that having attended with absolutely 0 web scraping experience I got a lot out of it!

I agree that this could be broken down into basic (visual) and advanced (code based) lesson. The mechanics of how to do that best I would leave to you... :)

The only things I would add are: a) some diagrams or perhaps looking at the structure of a very simple webpage using the element inspector might be a good way to get give those not familiar with HTML some more solid grounding (looking at the structure of the course material page can be a bit overwhelming)

b) maybe after introducing the concept of one or two selectors it would be good to jump straight into the visual scraper tool and try this out on the simple webpage. This could be followed up by the more in-depth discussion of various CSS selectors and the UNSC example. I think this would help cement the concept and break up the theoretical discussion at the start.

Last point, personally I think the UNSC example is really good. The quirks of this site show how difficult good scraping could be.