climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

EPA Transportation and Air Quality (TAQ) #360

Open hkuchampudi opened 7 years ago

hkuchampudi commented 7 years ago

EPA Transportation and Air Quality (TAQ)

I have mined the metadata for the EPA's Transportation and Air Quality (TAQ) documents and have hosted the direct download links to the documents in my repository. I need help mining the documents themselves as I do not have the space to download them.

Downloading PDFs

You can execute the following command replacing the placeholders with the appropriate values to download files in bulk:

awk 'FNR>=[Starting_Line_Number] && FNR<=[Ending_Line_Number]' [Links_Location] | while read -r link; do curl --retry 10 -OL $(echo $link | tr -d '\r'); done

Download Information

Property Value
Number links/documents 25690
Estimated total filesize 11 GB
JeremiahCurtis commented 7 years ago

Well, this is interesting. I ran downthemall (firefox) on the list of OTAQ links provided in the original post. Worked like a charm for the first half, but then I got thousands of 'file access errors'....everything on the list up to https://iaspub.epa.gov/otaqpub/display_file.jsp?docid=20729&flag=1 downloaded fine, but nothing after that

HostileGranola commented 7 years ago

I am pulling this data with wget --content-disposition --trust-server-names -i https://raw.githubusercontent.com/hkuchampudi/GovDataDump/master/EPA%20Transport%20and%20Air%20Quality/Links.txt. Will update with sizes and hashes once it is all downloaded.

HostileGranola commented 7 years ago

I have a copy of this data as of 12:00:00 UTC.

Hashes computed using hashdeep -erl are here.

Sizes computed using du -b --max-depth=1 --human-readable are here.

x775 commented 7 years ago

I have a complete copy as of this posting.

md5: 4735e9dc629746010baca30e368046d1 sha256: 86122a2a39cbbd2b35fd702b9c4e05106c46791a8dab4f36626dfe2815258656

Individual checksums: https://gist.github.com/x775/8cf8445faed2c47fc7702ad898be055d

Size: 11.67764GB

Compressed name: EPA_Transportation_and_Air_quality.7z Compressed md5: 3430c5799f66bcf248bae01771164938 Compressed sha256: 55fa39a1085c60c2611fc1a41577ed3eec5eb788a81e22f07a3623a54b3eb25c Compressed size: 5.20261GB Compressed download link: https://drive.google.com/open?id=0B6PlQrUTwL1PcmQ3Y29IOGhGY0U

derpasaurusz commented 7 years ago

Hello, I have a public mirror for this data located here

Hashes computed using hashdeep -erl located here

Size:12GB Compressed name: EPA_Transportation_and_Air_Quality_TAQ.tar.bz2 Compressed md5: a1d68b6a2b280b1b5344774f8b62cac7 Compressed sha256: a28a5a4b9d6832a040219ddedfb748885160a79805327cfe7294c1d6a41b9514 Compressed size:9.3GB Compressed download link can be found here

Still fairly new to this. Let me know if I missed anything.

HostileGranola commented 4 years ago

I no longer have this dataset due to storage space constraints. Apologies.