Open s-celles opened 8 years ago
This script (process.py)
#!/usr/bin/env python
import click
import os.path
import traceback
import glob
import pandas as pd
pd.set_option("max_rows", 10)
@click.command()
@click.option('--filename', default='immobilier-ain-01.xls', help='Filename')
@click.option('--output', default='csv', help='Type of output')
@click.option('--skiprows', default=4, help='Skip rows')
@click.option('--all/--no-all', default=False)
def main(all, filename, output, skiprows):
output = output.lower()
if all:
filenames = glob.glob("immobilier-*-*.xls")
else:
filenames = [filename]
errors = 0
for i, filename in enumerate(filenames):
try:
process(filename, output, skiprows)
except:
print(traceback.format_exc())
errors += 1
print("%d files processed (errors: %d)" % (i+1, errors))
def process(filename, output, skiprows):
print("Read '%s'" % filename)
df = pd.read_excel(filename, skiprows=skiprows, sheetname="Données", header=[0,1,2])
print(df)
print(df.columns)
if output == 'csv':
filename_out = os.path.splitext(filename)[0] + ".csv"
print("Write to '%s'" % filename_out)
df.to_csv(filename_out, index=False)
if __name__ == "__main__":
main()
can process all department files and output to CSV.
Some manual cleanup (renaming columns...) may be necessary.
Great @scls19fr ! Do you plan to make a house-prices-fr datapackage ?
I'm quite busy these days. But I think that someone (maybe you if you can) can do it quite easily thanks to these 3 scripts. I'm using Anaconda Python.
requests
, click
, pandas
are required packages
conda install package_name
I'll do it... In a few days !
@lexman how are you doing here?
Hello,
All I've been able to do yet is to look at the files.
Unfortunately, the files for each departement contain only the number of houses sold, not the prices ; the only file with prices is about Paris.
I won't be able to package it before a couple of weeks. Be back in a while...
@lexman Any progress here? maybe need some help?
Sorry @zelima, I was short on time for pas few weeks. I'll get back to it in a few days...
@lexman I'd leave this open here until we have a complete data package otherwise we lose track (even if there is a separate issue on the dataset repo). Hope that makes sense.
And great to see this in progress!
Sorry, I didn't notice I had closed the issue. It seams I pushed the wrong button when I made the comment... I'll let you know when work is over :)
@lexman great!
@lexman any updates here? /cc @Mikanebu
@lexman i've cloned your repo to https://github.com/datasets/house-prices-fr so we can get it up to scratch and get it published. I trust this is ok 😄
@AcckiyGerman can you take a look at https://github.com/datasets/house-prices-fr and check it is working and get it published (over the new few weeks).
yes, will do.
Following https://github.com/datasets/registry/issues/55
From @lexman
This Python script might help to download files for each department:
processing may also be done using Python Pandas to output "raw" csv files
This script can also help to find xls, xlsx, csv links