Departmental site biography seed data format

ryanbungard commented 6 years ago

How much of the HTML formatting should we keep from the biographies scraped/copied from departmental sites? Currently,

tags are kept. Should we keep links?

Also, there may be a bug keeping

tags. This example was brought up: http://cs-dev.cmu.edu/directory/john_galeotti vs. http://ri.cmu.edu/ri-faculty/john-galeotti/

RogueCSstaffer commented 6 years ago

What you scraped from CSD are not biographies. They are research statements by tenure or research faculty who advise PhD students. There is no bio info for anyone in CSD and only those able to advise doctoral students have something in the research statement block. The purpose of that block is "what I research" not "who I am".

Why are we producing intervening or duplicating information rather than top-level information and push to the departments that are maintaining details? A lot of the areas that you are attempting to populate are not remotely consistent across the departments, which will pose difficulty in filing them in.

If the goal is to have the SCS site as maintenance free as possible this doesn't seem like it is serving that end.

RogueCSstaffer commented 6 years ago

In the cited example the SCS version isn't very useful. Hard to read with no formatting, doesn't retain the pertinent information that was linked in the dept version. Note that RI calls these "Statements" and not bios either.

Further example: http://cs-dev.cmu.edu/directory/umut_acar

This CSD faculty has a link to his homepage under his name, the RI faculty referenced above does not - it could link to the RI page you referenced for the details rather than duplicating.

At the top level (SCS listing) their name, official title, link to their website, location and on campus contact information are all the pertinent details that should be easily maintainable via automated inputs. The publications are a nice bonus, but may be limited or blank depending upon department or faculty sites.

What are the projected refresh times for this type of content? I generally ask faculty to give me updates to statements a couple times a year. `

SchoolofComputerScience / scs-issues

Departmental site biography seed data format #76