code-for-portland-archive / data

A repo for tracking public datasets found by the Code for Portland community.
8 stars 2 forks source link

PDX Public Data Mirror #3

Open ungoldman opened 9 years ago

ungoldman commented 9 years ago

@ed-g has been kind enough to download the contents of ftp://ftp02.portlandoregon.gov/CivicApps/ and upload them to this repo, with the exception of Building_Footprints_pdx.zip which exceeds the file size limit of uploads for github.

https://github.com/CodeForPortland/data/tree/master/portlandoregon.gov/CivicApps

@meandavejustice have at it! We can talk about what we'd like to do in this issue.

Things we've discussed:

ungoldman commented 9 years ago

/cc @bcomnes @davy @maxogden

bcomnes commented 9 years ago

What are some free storage options that would work well with this? S3?

ed-g commented 9 years ago

Is no-cost a requirement or is there a budget for storage?

ungoldman commented 9 years ago

I may already have some extra credit for AWS and Digital Ocean thanks to Code for America. I can look it up and see how to set it up this week if that seems like a good option.

ungoldman commented 9 years ago

Actually @ed-g were you working on an OTP project with @pdxmele at OSBridge a while back?

ed-g commented 9 years ago

Yes I was. you can see the beta at http://maps.ed-groth.com

If you decide to go in the server based digital ocean direction I'd be happy to install the mirror tools there.

ungoldman commented 9 years ago

@ed-g the reason I ended up with AWS & DO credit was @pdxmele asked me if we could get some because you were looking for hosting help for that project. Guess the follow-up got lost in transit (probably my fault). Glad you're back in the loop!

ed-g commented 9 years ago

@ngoldman cool!

let's talk at some point about setting up server space for the bike there trip planner.

I've created a project: https://github.com/ed-g/ride-there-app/issues/1

ungoldman commented 9 years ago

Cloned the contents of ftp://ftp02.portlandoregon.gov/CivicApps/ to my work computer. Deleted the contents of the 7101 Copy directory as it's just a bunch of weird photos of an abandoned house that I don't think are relevant to CivicApps.

Total size of relevant contents for reference:

508,773,539 bytes (509 MB on disk) for 113 items

ungoldman commented 9 years ago

Cloning the entire contents of ftp02.portlandoregon.gov just to see what's in there since it's public.

ungoldman commented 9 years ago

Took more than 24 hours but I have a local and mostly complete copy of ftp02.portlandoregon.gov on my work machine. Some stats:

There were some errors (a couple files wouldn't transfer over even when I tried a second time) but 99.99% of the stuff got gotten.

It seems like the FTP server is meant to be shared by quite a few departments for low-security stuff (i.e. no sensitive data or anything precluding them for exposing it to the public). I did notice someone dumped a garbled password reset file for an account and looking at the number of files I wouldn't be surprised if there's some other information on there that really shouldn't be public.

There's a clue as to the official function and guidelines for the server here:

ftp://ftp02.portlandoregon.gov/_Instructions_/Using%20the%20Public%20FTP%20Server.pdf

It's not very well organized and most of it is not relevant to open data but rather simply where city employees are dumping files they're working with. I can see why some people might want to decommission it and move to a better model for sharing data internally and exposing open data externally.

ungoldman commented 9 years ago

A vaguely accurate count of filetypes can be found here: https://gist.github.com/ngoldman/7619ae1a8cb2f032442f