Closed Mikanebu closed 6 years ago
Analysis can be found in spreadsheet https://docs.google.com/spreadsheets/d/14kJluhePaMOx6vYBic0poVjDYK3I8-v_xF3sv4Focac/edit#gid=405792816
3 datasets to be automated:
gold-prices
- https://github.com/datasets/gold-prices
YYYY-MM
so we have 2 optionsyearmonth
according to the specs https://frictionlessdata.io/specs/table-schema/. But I cannot process it, since it throws casting error, opened issue https://github.com/frictionlessdata/tableschema-py/issues/201.date
with format any
. But, in this case, days
will be automatically added as a current day. For example, if pipeline runs today, date will be 1991-12-{current-day}
.oil-prices
- https://github.com/datasets/oil-prices - day,week,month,year
s-and-p-500
- https://github.com/datasets/s-and-p-500
1991.01
to 1991-01
, by replacing .
with -
for date columnFIXED, We have now filled in all the analysis columns and here are the major new features need:
Analysis spreadsheet is located here https://docs.google.com/spreadsheets/d/14kJluhePaMOx6vYBic0poVjDYK3I8-v_xF3sv4Focac/edit#gid=405792816
We want to automate more datasets, so they will be updated regularly.
Acceptance criteria
Tasks
.xls
and.xlsx
.txt
Analysis
eu-emissions-trading-system
- https://github.com/datasets/eu-emissions-trading-systemsea-level-rise
- https://github.com/datasets/sea-level-riseoil prices
- https://github.com/datasets/oil-pricesgold-prices
- https://github.com/datasets/gold-pricesco2-ppm
- https://github.com/datasets/co2-ppms-and-p-500
- https://github.com/datasets/s-and-p-5001991.02
to01-02-1991
Output:
eu-emissions-trading-system
- https://github.com/datasets/eu-emissions-trading-system I could not scrape it, after unzipping it cannot process file. It works when file name is the same as zipped version.sea-level-rise
- https://github.com/datasets/sea-level-rise I could not scrape it, after unzipping it cannot process file. It works when file name is the same as zipped version.oil prices
- https://github.com/datasets/oil-prices Source is https://www.eia.gov/dnav/pet/hist_xls/RBRTEd.xls. It converts date format incorrecly from 5/20/1987 to 31917.0.gold-prices
- https://github.com/datasets/gold-prices I could not remove last row. Source is http://www.bundesbank.de/cae/servlet/StatisticDownload?tsId=BBEX3.M.XAU.USD.EA.AC.C06&its_csvFormat=en&its_fileFormat=csv&mode=itsco2-ppm
- https://github.com/datasets/co2-ppm Source is in .txt format ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt It did not work for txt as well, according to the tabular-py documentation, split into rows with a single "data” column.