datopian / datapipes

Data Pipes for CSV
https://datapipes.okfnlabs.org/
MIT License
117 stars 16 forks source link

Passing in a URL to an excel file produces undesirable results #103

Open andylolz opened 10 years ago

andylolz commented 10 years ago

e.g. this: http://datapipes.okfnlabs.org/none?url=https://github.com/okfn/messytables/raw/master/horror/simple.xls

davidmiller commented 10 years ago

I would v. much like to pass in this excel sheet [1] as the url and then drop the nonsense headers with a datapipes transform...

[1] https://indicators.ic.nhs.uk/download/GP%20Practice%20data/summaries/demography/Practice%20Addresses%20Final.xls

rufuspollock commented 10 years ago

@davidmiller issue is we need excel parsing in node and it doesn't seem to exist (maybe for xlsx) ...

davidmiller commented 10 years ago

Pass off to http://okfnlabs.org/dataconverters/ As-A-Service?

rufuspollock commented 10 years ago

@davidmiller sure but we need that deployed "as a service" :-) (easy to do but needs a small bit of work i imagine).

SheetJSDev commented 10 years ago

we need excel parsing in node and it doesn't seem to exist (maybe for xlsx) ...

@rgrp shameless plug: xlsjs on npm is an XLS parser (the javascript also works in-browser: http://oss.sheetjs.com/js-xls/ )

rufuspollock commented 10 years ago

@SheetJSDev that is awesome :-) We'd love to use this if that was ok :-)

SheetJSDev commented 10 years ago

It's Apache 2.0 licensed and the source is on github ( https://github.com/SheetJS/js-xls ) so there really shouldn't be a problem.

rufuspollock commented 10 years ago

@SheetJSDev this is absolutely fantastic. Please say what kind of credit you'd like us to have on the site.

rufuspollock commented 10 years ago

@davidmiller would you be up for having a go at an incoming parser based on this?

davidmiller commented 10 years ago

Entirely possible.

What's the status of implementing all the transforms etc as fail-early streams - that was the major issue last time I was paying close attention?

rufuspollock commented 10 years ago

@davidmiller fail early streams is ongoing in #110 but it wouldn't be a blocker for this (i mean we can't stream an excel file anyway in the true sense since you need to read the whole file to use IIRC).

davidmiller commented 10 years ago

Also - you're less likely to get 12GB excel files - so less of an issue here one suspects.

Can turn into a streamable for downstream consumption as a reasonable compromise