atmdata / atmdata.github.io

Source code for the atmdata.github.io website
https://atmdata.github.io
25 stars 5 forks source link

Fixed typo and added new air quality data source in data_sources.md #5

Closed arthurlli closed 3 years ago

arthurlli commented 3 years ago

Hi, I added an open data provided by Japanese government, although it is in Japanese. Some keywords are translated below:

大気環境 = atmospheric environment 時間値データ(2009~2018年度)= time series data (from 2009-2018) 都道府県名 = prefectures 全国 = entire country (i.e., all prefectures) 年度 = year

xoolive commented 3 years ago

どうも・多謝 Just wondering about the format of the data we download. It's called txt but it's binary in the end? Any experience using/parsing it?

arthurlli commented 3 years ago

Yes, the data file is in .txt format. The reason why it looks like binary is because it is about SO2 concentration in ppb, which usually be 1 or 0 in Japan. If we open other data, we would see the difference.

However, to import such data, we need to follow the instructions here. Then, find and set the "File Origin" to "932: Japanese (Shift-JIS)" to view column names (as screenshot below): image The corresponding columns are translated: Year Measuring station code District code Type of pollutant Unit Month Day 1st ~ 24th hour

Unfortunately there is no official english version.

xoolive commented 3 years ago

I couldn't reproduce what you recommend but got it the old fashioned way with iconv. I'll keep my opinions about encoding to myself... 🙄

$ iconv -f SHIFT_JIS -t UTF-8 TD20180126.txt | head -n5
測定年度,項目種類コード,項目コード_数字,項目コード_英数字,測定方法コード,都道府県コード,都道府県名,都道府県名_ローマ字,市区町村コード,市区町村名,市区町村名_ローマ字,測定局コード,測定局名,測定局名_ローマ字,測定局区分コード,測定局種別コード,用途地域コード,用途地域名,令別表第3の区分,有効測定日数(日),測定時間(時間),年平均値(ppm),1時間値が0.1ppmを超えた時間数(時間),1時間値が0.1ppmを超えた時間数の測定時間数に対する割合(%),日平均値が0.04ppmを超えた日数(日),日平均値が0.04ppmを超えた日数の有効測定日数に対する割合(%),1時間値の最高値(ppm),日平均値の2%除外値(ppm),日平均値が0.04ppmを超えた日が2日以上連続したことの有無(有:X・無:O),環境基準の長期的評価による日平均値が0.04ppmを超えた日数(日),測定方法,年間集計項目13,年間集計項目14,年間集計項目15,年間集計項目16,有効測定日数(日)_4月,有効測定日数(日)_5月,有効測定日数(日)_6月,有効測定日数(日)_7月,有効測定日数(日)_8月,有効測定日数(日)_9月,有効測定日数(日)_10月,有効測定日数(日)_11月,有効測定日数(日)_12月,有効測定日数(日)_1月,有効測定日数(日)_2月,有効測定日数(日)_3月,測定時間(時間)_4月,測定時間(時間)_5月,測定時間(時間)_6月,測定時間(時間)_7月,測定時間(時間)_8月,測定時間(時間)_9月,測定時間(時間)_10月,測定時間(時間)_11月,測定時間(時間)_12月,測定時間(時間)_1月,測定時間(時間)_2月,測定時間(時間)_3月,月平均値(ppm)_4月,月平均値(ppm)_5月,月平均値(ppm)_6月,月平均値(ppm)_7月,月平均値(ppm)_8月,月平均値(ppm)_9月,月平均値(ppm)_10月,月平均値(ppm)_11月,月平均値(ppm)_12月,月平均値(ppm)_1月,月平均値(ppm)_2月,月平均値(ppm)_3月,1時間値が0.1ppmを超えた時間数(時間)_4月,1時間値が0.1ppmを超えた時間数(時間)_5月,1時間値が0.1ppmを超えた時間数(時間)_6月,1時間値が0.1ppmを超えた時間数(時間)_7月,1時間値が0.1ppmを超えた時間数(時間)_8月,1時間値が0.1ppmを超えた時間数(時間)_9月,1時間値が0.1ppmを超えた時間数(時間)_10月,1時間値が0.1ppmを超えた時間数(時間)_11月,1時間値が0.1ppmを超えた時間数(時間)_12月,1時間値が0.1ppmを超えた時間数(時間)_1月,1時間値が0.1ppmを超えた時間数(時間)_2月,1時間値が0.1ppmを超えた時間数(時間)_3月,日平均値が0.04ppmを超えた日数(日)_4月,日平均値が0.04ppmを超えた日数(日)_5月,日平均値が0.04ppmを超えた日数(日)_6月,日平均値が0.04ppmを超えた日数(日)_7月,日平均値が0.04ppmを超えた日数(日)_8月,日平均値が0.04ppmを超えた日数(日)_9月,日平均値が0.04ppmを超えた日数(日)_10月,日平均値が0.04ppmを超えた日数(日)_11月,日平均値が0.04ppmを超えた日数(日)_12月,日平均値が0.04ppmを超えた日数(日)_1月,日平均値が0.04ppmを超えた日数(日)_2月,日平均値が0.04ppmを超えた日数(日)_3月,1時間値の最高値(ppm)_4月,1時間値の最高値(ppm)_5月,1時間値の最高値(ppm)_6月,1時間値の最高値(ppm)_7月,1時間値の最高値(ppm)_8月,1時間値の最高値(ppm)_9月,1時間値の最高値(ppm)_10月,1時間値の最高値(ppm)_11月,1時間値の最高値(ppm)_12月,1時間値の最高値(ppm)_1月,1時間値の最高値(ppm)_2月,1時間値の最高値(ppm)_3月,日平均値の最高値(ppm)_4月,日平均値の最高値(ppm)_5月,日平均値の最高値(ppm)_6月,日平均値の最高値(ppm)_7月,日平均値の最高値(ppm)_8月,日平均値の最高値(ppm)_9月,日平均値の最高値(ppm)_10月,日平均値の最高値(ppm)_11月,日平均値の最高値(ppm)_12月,日平均値の最高値(ppm)_1月,日平均値の最高値(ppm)_2月,日平均値の最高値(ppm)_3月,月間集計項目8_4月,月間集計項目8_5月,月間集計項目8_6月,月間集計項目8_7月,月間集計項目8_8月,月間集計項目8_9月,月間集計項目8_10月,月間集計項目8_11月,月間集計項目8_12月,月間集計項目8_1月,月間集計項目8_2月,月間集計項目8_3月,月間集計項目9_4月,月間集計項目9_5月,月間集計項目9_6月,月間集計項目9_7月,月間集計項目9_8月,月間集計項目9_9月,月間集計項目9_10月,月間集計項目9_11月,月間集計項目9_12月,月間集計項目9_1月,月間集計項目9_2月,月間集計項目9_3月
2018,1,01,SO2,2,26,京都府,Kyoto-fu,26104,京都市中京区,Kyouto-shi Nakagyou-ku,26104060,壬生,Mibu,1,0,3,準工,560,364,8703,0.004,0,0,0,0,0.013,0.008,O,0,2,,,,,30,31,30,31,31,30,31,30,31,31,28,30,716,740,715,738,740,718,741,715,737,740,670,733,0.004,0.005,0.005,0.006,0.006,0.005,0.003,0.003,0.003,0.003,0.003,0.003,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.013,0.011,0.01,0.012,0.013,0.009,0.006,0.006,0.008,0.007,0.007,0.008,0.007,0.008,0.007,0.009,0.009,0.007,0.005,0.004,0.004,0.004,0.005,0.005,,,,,,,,,,,,,,,,,,,,,,,,
2018,1,01,SO2,3,26,京都府,Kyoto-fu,26109,京都市伏見区,Kyouto-shi Fushimi-ku,26109010,伏見,Fushimi,1,0,3,準工,560,362,8672,0.001,0,0,0,0,0.014,0.003,O,0,3,,,,,30,31,30,31,31,30,31,30,30,29,28,31,712,735,715,736,739,715,738,715,733,729,667,738,0.001,0.002,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.014,0.006,0.004,0.004,0.004,0.003,0.002,0.004,0.004,0.006,0.004,0.004,0.004,0.004,0.002,0.002,0.002,0.002,0.002,0.002,0.003,0.003,0.002,0.002,,,,,,,,,,,,,,,,,,,,,,,,
2018,1,01,SO2,3,26,京都府,Kyoto-fu,26110,京都市山科区,Kyouto-shi Yamashina-ku,26110010,山科,Yamashina,1,0,1,住,560,362,8666,0.001,0,0,0,0,0.012,0.002,O,0,3,,,,,30,31,30,31,31,30,31,30,30,29,28,31,714,735,713,738,738,713,736,715,733,727,666,738,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.012,0.006,0.004,0.003,0.004,0.003,0.002,0.003,0.003,0.004,0.004,0.003,0.003,0.003,0.002,0.002,0.002,0.001,0.001,0.002,0.002,0.002,0.002,0.001,,,,,,,,,,,,,,,,,,,,,,,,
2018,1,01,SO2,3,26,京都府,Kyoto-fu,26111,京都市西京区,Kyouto-shi Nishikyou-ku,26111010,西京,Nishikyou,1,0,1,住,560,361,8661,0.001,0,0,0,0,0.011,0.002,O,0,3,,,,,30,31,30,31,31,30,31,29,30,29,28,31,714,736,715,738,739,713,736,705,733,728,667,737,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0,0.001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.011,0.006,0.004,0.004,0.003,0.002,0.005,0.008,0.003,0.004,0.004,0.006,0.003,0.003,0.002,0.001,0.002,0.001,0.001,0.002,0.001,0.001,0.001,0.002,,,,,,,,,,,,,,,,,,,,,,,,
arthurlli commented 3 years ago

I used to handle it with excel but haven't tried with iconv, which seems interesting. Another way to import the data is using python pandas library with argument "encoding='cp932": import pandas as pd dtf = pd.read_csv('j012018_01.txt', sep=',', encoding='cp932') print(dtf) it'll show the result as same as previous one.