Webdevdata / webdevdata.org

Website for reports, etc.
44 stars 7 forks source link

usage examples #3

Open nwtn opened 11 years ago

nwtn commented 11 years ago

It would be awesome to see some typical examples of how to search through these data for specific tags, etc. It could eliminate a barrier to use.

marcoscaceres commented 11 years ago

@nwtn agreed. Do you want to volunteer to do them? They are just a few simple greps from the command line.

nwtn commented 11 years ago

sure

marcoscaceres commented 11 years ago

Here is a somewhat crappy one:

find ./ -print | xargs grep -l picturefill
marcoscaceres commented 11 years ago

Better one, thanks to @yoavweiss. Finds "apple-touch-icon"s in the HTML files, and spits out the count

find ./ -name "*ml.txt" |  xargs grep -l apple-touch-icon | wc -l
oli commented 10 years ago

If you want to use the included tools, check out:

Webdevdata/webdevdata-tools

These tools produce comma-separated output, with one line per matched page.

baptistelebail/webdevdata.org

These produce semi-colon-separated summaries. For webdevdata-query.sh this includes:

Refer to the Wiki page for details and examples

webdevdata-query.sh took about 40 minutes per pass on the 2013-10 dataset for me, regardless of the number of CSS-like query terms, so if you’re querying multiple things (e.g. all the sectioning elements) list them all in one query, e.g.:

./webdevdata-query.sh webdevdata.org-2013-10-30-231036 body article section nav h1 h2 h3 h4 h5 h6 hgroup main

HTH

marcoscaceres commented 10 years ago

@oli, so I think what we are going to do is allow each repo provide examples of it's own usage. The front page of webdevdata.org is currently very poorly maintained :(

oli commented 10 years ago

@marcoscaceres here are some more to get you going then:

find ./ -type f | xargs grep -il "<html" | wc -l
find ./ -type f -name "*.assembler" -exec head -n 2 '{}' +
find ./ -type f -not -name "*.hdr.txt" -size -100c | wc -l

HTH!

marcoscaceres commented 10 years ago

@oli super helpful! Thanks so much for all these! Ok, we now have a pretty good set to show how this all works.

Will probably just start by collating all these and adding them to the README.