Oslandia / osm-data-classification

Migrated to: https://gitlab.com/Oslandia/osm-data-classification
MIT License
24 stars 6 forks source link

Failed Tasks and dependency #2

Closed gangothri1329 closed 6 years ago

gangothri1329 commented 6 years ago

Hi,

Thank you so much for the tool!

I have been following the README but still get failed tasks.I've installed the required dependencies but I still get an output with tasks failed or left pending. I am running this on Windows. Thanks in advance.


f:\osm-data-classification-master\abc\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools
2018-07-06 16:50:50,761 :: INFO :: instance : Loaded []
DEBUG: Checking if AutoKMeans(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-06 16:50:51,870 :: DEBUG :: check_complete : Checking if AutoKMeans(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
DEBUG: Checking if KMeansReport(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-06 16:50:51,873 :: DEBUG :: check_complete : Checking if KMeansReport(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
DEBUG: Checking if KMeansAnalysis(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-06 16:50:51,874 :: DEBUG :: check_complete : Checking if KMeansAnalysis(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
INFO: Informed scheduler that task   AutoKMeans_data_bordeaux_metropo_user_c76397ef0a   has status   PENDING
2018-07-06 16:50:51,876 :: INFO :: _add_task : Informed scheduler that task   AutoKMeans_data_bordeaux_metropo_user_c76397ef0a   has status   PENDING
INFO: Informed scheduler that task   KMeansAnalysis_data_bordeaux_metropo_user_c76397ef0a   has status   PENDING
2018-07-06 16:50:51,878 :: INFO :: _add_task : Informed scheduler that task   KMeansAnalysis_data_bordeaux_metropo_user_c76397ef0a   has status   PENDING
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=2) is complete
2018-07-06 16:50:51,882 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=2) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=3) is complete
2018-07-06 16:50:51,883 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=3) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=4) is complete
2018-07-06 16:50:51,884 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=4) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=5) is complete
2018-07-06 16:50:51,885 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=5) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=6) is complete
2018-07-06 16:50:51,887 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=6) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=7) is complete
2018-07-06 16:50:51,888 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=7) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=8) is complete
2018-07-06 16:50:51,889 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=8) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=9) is complete
2018-07-06 16:50:51,893 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, n_components=0, nb_clusters=9) is complete
INFO: Informed scheduler that task   KMeansReport_data_bordeaux_metropo_user_c76397ef0a   has status   PENDING
2018-07-06 16:50:51,895 :: INFO :: _add_task : Informed scheduler that task   KMeansReport_data_bordeaux_metropo_user_c76397ef0a   has status   PENDING
DEBUG: Checking if AutoPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
2018-07-06 16:50:51,898 :: DEBUG :: check_complete : Checking if AutoPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
DEBUG: Checking if PlottingPCAFeatureContributions(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
2018-07-06 16:50:51,899 :: DEBUG :: check_complete : Checking if PlottingPCAFeatureContributions(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
DEBUG: Checking if PlottingPCACorrelationCircle(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
2018-07-06 16:50:51,900 :: DEBUG :: check_complete : Checking if PlottingPCACorrelationCircle(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_460c2d45eb   has status   PENDING
2018-07-06 16:50:51,905 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_460c2d45eb   has status   PENDING
DEBUG: Checking if VarianceAnalysisTask(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=) is complete
2018-07-06 16:50:51,907 :: DEBUG :: check_complete : Checking if VarianceAnalysisTask(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=) is complete
INFO: Informed scheduler that task   PlottingPCACorrelationCircle_data_bordeaux_metropo_user_3601814570   has status   PENDING
2018-07-06 16:50:51,909 :: INFO :: _add_task : Informed scheduler that task   PlottingPCACorrelationCircle_data_bordeaux_metropo_user_3601814570   has status   PENDING
DEBUG: Checking if MetadataNormalization(datarep=data, dsname=bordeaux-metropole, metadata_type=user) is complete
2018-07-06 16:50:51,910 :: DEBUG :: check_complete : Checking if MetadataNormalization(datarep=data, dsname=bordeaux-metropole, metadata_type=user) is complete
INFO: Informed scheduler that task   VarianceAnalysisTask_data_bordeaux_metropo__28294a788a   has status   PENDING
2018-07-06 16:50:51,911 :: INFO :: _add_task : Informed scheduler that task   VarianceAnalysisTask_data_bordeaux_metropo__28294a788a   has status   PENDING
DEBUG: Checking if OSMElementEnrichment(datarep=data, dsname=bordeaux-metropole) is complete
2018-07-06 16:50:51,915 :: DEBUG :: check_complete : Checking if OSMElementEnrichment(datarep=data, dsname=bordeaux-metropole) is complete
DEBUG: Checking if AddExtraInfoUserMetadata(datarep=data, dsname=bordeaux-metropole, n_top_editor=5) is complete
2018-07-06 16:50:51,916 :: DEBUG :: check_complete : Checking if AddExtraInfoUserMetadata(datarep=data, dsname=bordeaux-metropole, n_top_editor=5) is complete
INFO: Informed scheduler that task   MetadataNormalization_data_bordeaux_metropo_user_7662008ac0   has status   PENDING
2018-07-06 16:50:51,918 :: INFO :: _add_task : Informed scheduler that task   MetadataNormalization_data_bordeaux_metropo_user_7662008ac0   has status   PENDING
DEBUG: Checking if EditorCountByUser(datarep=data, n_top_editor=5) is complete
2018-07-06 16:50:51,920 :: DEBUG :: check_complete : Checking if EditorCountByUser(datarep=data, n_top_editor=5) is complete
DEBUG: Checking if UserMetadataExtract(datarep=data, dsname=bordeaux-metropole) is complete
2018-07-06 16:50:51,922 :: DEBUG :: check_complete : Checking if UserMetadataExtract(datarep=data, dsname=bordeaux-metropole) is complete
INFO: Informed scheduler that task   AddExtraInfoUserMetadata_data_bordeaux_metropo_5_f27c0fcbd7   has status   PENDING
2018-07-06 16:50:51,926 :: INFO :: _add_task : Informed scheduler that task   AddExtraInfoUserMetadata_data_bordeaux_metropo_5_f27c0fcbd7   has status   PENDING
DEBUG: Checking if ChangeSetMetadataExtract(datarep=data, dsname=bordeaux-metropole) is complete
2018-07-06 16:50:51,927 :: DEBUG :: check_complete : Checking if ChangeSetMetadataExtract(datarep=data, dsname=bordeaux-metropole) is complete
INFO: Informed scheduler that task   UserMetadataExtract_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
2018-07-06 16:50:51,929 :: INFO :: _add_task : Informed scheduler that task   UserMetadataExtract_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
INFO: Informed scheduler that task   ChangeSetMetadataExtract_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
2018-07-06 16:50:51,930 :: INFO :: _add_task : Informed scheduler that task   ChangeSetMetadataExtract_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
DEBUG: Checking if TopMostUsedEditors(datarep=data) is complete
2018-07-06 16:50:51,931 :: DEBUG :: check_complete : Checking if TopMostUsedEditors(datarep=data) is complete
INFO: Informed scheduler that task   EditorCountByUser_data_5_2262508398   has status   PENDING
2018-07-06 16:50:51,933 :: INFO :: _add_task : Informed scheduler that task   EditorCountByUser_data_5_2262508398   has status   PENDING
INFO: Informed scheduler that task   TopMostUsedEditors_data_c4ab8ddf6b   has status   PENDING
2018-07-06 16:50:51,937 :: INFO :: _add_task : Informed scheduler that task   TopMostUsedEditors_data_c4ab8ddf6b   has status   PENDING
DEBUG: Checking if OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole) is complete
2018-07-06 16:50:51,938 :: DEBUG :: check_complete : Checking if OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole) is complete
INFO: Informed scheduler that task   OSMElementEnrichment_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
2018-07-06 16:50:51,940 :: INFO :: _add_task : Informed scheduler that task   OSMElementEnrichment_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
INFO: Informed scheduler that task   OSMHistoryParsing_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
2018-07-06 16:50:51,941 :: INFO :: _add_task : Informed scheduler that task   OSMHistoryParsing_data_bordeaux_metropo_ddd7fcc55b   has status   PENDING
INFO: Informed scheduler that task   PlottingPCAFeatureContributions_data_bordeaux_metropo_user_3601814570   has status   PENDING
2018-07-06 16:50:51,943 :: INFO :: _add_task : Informed scheduler that task   PlottingPCAFeatureContributions_data_bordeaux_metropo_user_3601814570   has status   PENDING
DEBUG: Checking if PlottingVarianceAnalysis(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
2018-07-06 16:50:51,948 :: DEBUG :: check_complete : Checking if PlottingVarianceAnalysis(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
INFO: Informed scheduler that task   AutoPCA_data_bordeaux_metropo__b79d16b3a5   has status   PENDING
2018-07-06 16:50:51,950 :: INFO :: _add_task : Informed scheduler that task   AutoPCA_data_bordeaux_metropo__b79d16b3a5   has status   PENDING
INFO: Informed scheduler that task   PlottingVarianceAnalysis_data_bordeaux_metropo__b79d16b3a5   has status   PENDING
2018-07-06 16:50:51,951 :: INFO :: _add_task : Informed scheduler that task   PlottingVarianceAnalysis_data_bordeaux_metropo__b79d16b3a5   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_6ca60848b4   has status   PENDING
2018-07-06 16:50:51,954 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_6ca60848b4   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_7e2b172958   has status   PENDING
2018-07-06 16:50:51,957 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_7e2b172958   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_907bd7349a   has status   PENDING
2018-07-06 16:50:51,959 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_907bd7349a   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_d310e9eff3   has status   PENDING
2018-07-06 16:50:51,961 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_d310e9eff3   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_6cc67586f5   has status   PENDING
2018-07-06 16:50:51,963 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_6cc67586f5   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_9581a89357   has status   PENDING
2018-07-06 16:50:51,965 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_9581a89357   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_e37b4903dd   has status   PENDING
2018-07-06 16:50:51,968 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_bordeaux_metropo_user_e37b4903dd   has status   PENDING
INFO: Done scheduling tasks
2018-07-06 16:50:51,969 :: INFO :: _schedule_and_run : Done scheduling tasks
INFO: Running Worker with 1 processes
2018-07-06 16:50:51,970 :: INFO :: run : Running Worker with 1 processes
DEBUG: Asking scheduler for work...
2018-07-06 16:50:51,971 :: DEBUG :: _get_work : Asking scheduler for work...
2018-07-06 16:50:51,972 :: INFO :: prune : Starting pruning of task graph
2018-07-06 16:50:51,973 :: INFO :: prune : Done pruning task graph
DEBUG: Pending tasks: 24
2018-07-06 16:50:51,973 :: DEBUG :: run : Pending tasks: 24
INFO: [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) running   TopMostUsedEditors(datarep=data)
2018-07-06 16:50:51,976 :: INFO :: run : [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) running   TopMostUsedEditors(datarep=data)
ERROR: [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) failed    TopMostUsedEditors(datarep=data)
Traceback (most recent call last):
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 129, in _run_get_new_deps
    task_gen = self.task.run()
  File "F:\osm-data-classification-master\src\analysis_tasks.py", line 296, in run
    user_editor = pd.read_csv(fobj, header=None, names=['uid', 'value', 'num'])
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 411, in _read
    data = parser.read(nrows)
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 1005, in read
    ret = self._engine.read(nrows)
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 1748, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)
  File "pandas\_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)
  File "pandas\_libs\parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)
  File "pandas\_libs\parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)
  File "pandas\_libs\parsers.pyx", line 2173, in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28589)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 111158: character maps to <undefined>
2018-07-06 16:50:53,524 :: ERROR :: run : [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) failed    TopMostUsedEditors(datarep=data)
Traceback (most recent call last):
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 129, in _run_get_new_deps
    task_gen = self.task.run()
  File "F:\osm-data-classification-master\src\analysis_tasks.py", line 296, in run
    user_editor = pd.read_csv(fobj, header=None, names=['uid', 'value', 'num'])
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 411, in _read
    data = parser.read(nrows)
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 1005, in read
    ret = self._engine.read(nrows)
  File "f:\osm-data-classification-master\abc\lib\site-packages\pandas\io\parsers.py", line 1748, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)
  File "pandas\_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)
  File "pandas\_libs\parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)
  File "pandas\_libs\parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)
  File "pandas\_libs\parsers.pyx", line 2173, in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28589)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 111158: character maps to <undefined>
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-06 16:50:53,545 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   TopMostUsedEditors_data_c4ab8ddf6b   has status   FAILED
2018-07-06 16:50:53,560 :: INFO :: _add_task : Informed scheduler that task   TopMostUsedEditors_data_c4ab8ddf6b   has status   FAILED
DEBUG: Asking scheduler for work...
2018-07-06 16:50:53,563 :: DEBUG :: _get_work : Asking scheduler for work...
2018-07-06 16:50:53,565 :: INFO :: prune : Starting pruning of task graph
2018-07-06 16:50:53,567 :: INFO :: prune : Done pruning task graph
DEBUG: Pending tasks: 24
2018-07-06 16:50:53,570 :: DEBUG :: run : Pending tasks: 24
INFO: [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) running   OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole)
2018-07-06 16:50:53,572 :: INFO :: run : [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) running   OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole)
<TRACE> Initialization of a TimelineHandler instance !
ERROR: [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) failed    OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole)
Traceback (most recent call last):
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 129, in _run_get_new_deps
    task_gen = self.task.run()
  File "F:\osm-data-classification-master\src\data_preparation_tasks.py", line 62, in run
    tlhandler.apply_file(datapath)
RuntimeError: Open failed for 'data\raw\bordeaux-metropole.osh.pbf': The system cannot find the file specified.

2018-07-06 16:50:53,664 :: ERROR :: run : [pid 11560] Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) failed    OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole)
Traceback (most recent call last):
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "f:\osm-data-classification-master\abc\lib\site-packages\luigi\worker.py", line 129, in _run_get_new_deps
    task_gen = self.task.run()
  File "F:\osm-data-classification-master\src\data_preparation_tasks.py", line 62, in run
    tlhandler.apply_file(datapath)
RuntimeError: Open failed for 'data\raw\bordeaux-metropole.osh.pbf': The system cannot find the file specified.

DEBUG: 1 running tasks, waiting for next task to finish
2018-07-06 16:50:53,703 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   OSMHistoryParsing_data_bordeaux_metropo_ddd7fcc55b   has status   FAILED
2018-07-06 16:50:53,707 :: INFO :: _add_task : Informed scheduler that task   OSMHistoryParsing_data_bordeaux_metropo_ddd7fcc55b   has status   FAILED
DEBUG: Asking scheduler for work...
2018-07-06 16:50:53,708 :: DEBUG :: _get_work : Asking scheduler for work...
2018-07-06 16:50:53,712 :: INFO :: prune : Starting pruning of task graph
2018-07-06 16:50:53,712 :: INFO :: prune : Done pruning task graph
DEBUG: Done
2018-07-06 16:50:53,713 :: DEBUG :: _log_remote_tasks : Done
DEBUG: There are no more tasks to run at this time
2018-07-06 16:50:53,714 :: DEBUG :: _log_remote_tasks : There are no more tasks to run at this time
DEBUG: There are 24 pending tasks possibly being run by other workers
2018-07-06 16:50:53,715 :: DEBUG :: _log_remote_tasks : There are 24 pending tasks possibly being run by other workers
DEBUG: There are 24 pending tasks unique to this worker
2018-07-06 16:50:53,716 :: DEBUG :: _log_remote_tasks : There are 24 pending tasks unique to this worker
DEBUG: There are 24 pending tasks last scheduled by this worker
2018-07-06 16:50:53,717 :: DEBUG :: _log_remote_tasks : There are 24 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) was stopped. Shutting down Keep-Alive thread
2018-07-06 16:50:53,718 :: INFO :: run : Worker Worker(salt=377050056, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=11560) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====

Scheduled 24 tasks of which:
* 2 failed:
    - 1 OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole)
    - 1 TopMostUsedEditors(datarep=data)
* 22 were left pending, among these:
    * 22 had failed dependencies:
        - 1 AddExtraInfoUserMetadata(datarep=data, dsname=bordeaux-metropole, n_top_editor=5)
        - 1 AutoKMeans(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        - 1 AutoPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
        - 1 ChangeSetMetadataExtract(datarep=data, dsname=bordeaux-metropole)
        - 1 EditorCountByUser(datarep=data, n_top_editor=5)
        ...

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

2018-07-06 16:50:53,739 :: INFO :: _schedule_and_run :
===== Luigi Execution Summary =====

Scheduled 24 tasks of which:
* 2 failed:
    - 1 OSMHistoryParsing(datarep=data, dsname=bordeaux-metropole)
    - 1 TopMostUsedEditors(datarep=data)
* 22 were left pending, among these:
    * 22 had failed dependencies:
        - 1 AddExtraInfoUserMetadata(datarep=data, dsname=bordeaux-metropole, n_top_editor=5)
        - 1 AutoKMeans(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        - 1 AutoPCA(datarep=data, dsname=bordeaux-metropole, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
        - 1 ChangeSetMetadataExtract(datarep=data, dsname=bordeaux-metropole)
        - 1 EditorCountByUser(datarep=data, n_top_editor=5)
        ...

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====```
delhomer commented 6 years ago

Hi @gangothri1329 ! Thank you for your issue submission, it seems indeed that the project needs a packaging refactoring.

The most informative line in the log is the following one:

RuntimeError: Open failed for 'data\raw\bordeaux-metropole.osh.pbf': The system cannot find the file specified.

Actually, there are several possible explanations there:

Do not hesitate to provide more details on your problem (i.e. which command do you run on your terminal?), I will be there for further discussions about it!

gangothri1329 commented 6 years ago

Thanks for the prompt reply.

It seems that the osh.pbf extension is renamed to osm.pbf . renaming the extension got rid of the problem.

The 295th line in analysis_tasks.py is rewritten as it detects wrong encoding for windows. with open(osp.join(self.datarep, OUTPUT_DIR, self.editor_fname),encoding="utf-8") as fobj

Still, there arises an error numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs

I have set the PYTHONPATH to src and running the command line from the folder which contains data folder and srcfolder.

Thank you in advance.

F:\osm-data-classification-master>python -m luigi --local-scheduler --module analysis_tasks AutoKMeans --dsname greater-london
C:\Python36_64\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools
2018-07-11 13:02:27,511 :: INFO :: instance : Loaded []
DEBUG: Checking if AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-11 13:02:28,256 :: DEBUG :: check_complete : Checking if AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
DEBUG: Checking if KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-11 13:02:28,258 :: DEBUG :: check_complete : Checking if KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
DEBUG: Checking if KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-11 13:02:28,259 :: DEBUG :: check_complete : Checking if KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
INFO: Informed scheduler that task   AutoKMeans_data_greater_london_user_7d1cf8ed6a   has status   PENDING
2018-07-11 13:02:28,260 :: INFO :: _add_task : Informed scheduler that task   AutoKMeans_data_greater_london_user_7d1cf8ed6a   has status   PENDING
INFO: Informed scheduler that task   KMeansAnalysis_data_greater_london_user_7d1cf8ed6a   has status   PENDING
2018-07-11 13:02:28,262 :: INFO :: _add_task : Informed scheduler that task   KMeansAnalysis_data_greater_london_user_7d1cf8ed6a   has status   PENDING
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2) is complete
2018-07-11 13:02:28,268 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=3) is complete
2018-07-11 13:02:28,270 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=3) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=4) is complete
2018-07-11 13:02:28,271 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=4) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=5) is complete
2018-07-11 13:02:28,272 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=5) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=6) is complete
2018-07-11 13:02:28,273 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=6) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=7) is complete
2018-07-11 13:02:28,274 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=7) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=8) is complete
2018-07-11 13:02:28,276 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=8) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=9) is complete
2018-07-11 13:02:28,277 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=9) is complete
INFO: Informed scheduler that task   KMeansReport_data_greater_london_user_7d1cf8ed6a   has status   PENDING
2018-07-11 13:02:28,282 :: INFO :: _add_task : Informed scheduler that task   KMeansReport_data_greater_london_user_7d1cf8ed6a   has status   PENDING
DEBUG: Checking if AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
2018-07-11 13:02:28,287 :: DEBUG :: check_complete : Checking if AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
DEBUG: Checking if PlottingPCAFeatureContributions(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
2018-07-11 13:02:28,289 :: DEBUG :: check_complete : Checking if PlottingPCAFeatureContributions(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
DEBUG: Checking if PlottingPCACorrelationCircle(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
2018-07-11 13:02:28,295 :: DEBUG :: check_complete : Checking if PlottingPCACorrelationCircle(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_732745e15a   has status   PENDING
2018-07-11 13:02:28,298 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_732745e15a   has status   PENDING
DEBUG: Checking if VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=) is complete
2018-07-11 13:02:28,301 :: DEBUG :: check_complete : Checking if VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=) is complete
INFO: Informed scheduler that task   PlottingPCACorrelationCircle_data_greater_london_user_36034479b3   has status   PENDING
2018-07-11 13:02:28,306 :: INFO :: _add_task : Informed scheduler that task   PlottingPCACorrelationCircle_data_greater_london_user_36034479b3   has status   PENDING
DEBUG: Checking if MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user) is complete
2018-07-11 13:02:28,308 :: DEBUG :: check_complete : Checking if MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user) is complete
INFO: Informed scheduler that task   VarianceAnalysisTask_data_greater_london__1f9fa48625   has status   PENDING
2018-07-11 13:02:28,310 :: INFO :: _add_task : Informed scheduler that task   VarianceAnalysisTask_data_greater_london__1f9fa48625   has status   PENDING
DEBUG: Checking if OSMElementEnrichment(datarep=data, dsname=greater-london) is complete
2018-07-11 13:02:28,314 :: DEBUG :: check_complete : Checking if OSMElementEnrichment(datarep=data, dsname=greater-london) is complete
DEBUG: Checking if AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5) is complete
2018-07-11 13:02:28,318 :: DEBUG :: check_complete : Checking if AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5) is complete
INFO: Informed scheduler that task   MetadataNormalization_data_greater_london_user_3e178c4ed0   has status   PENDING
2018-07-11 13:02:28,320 :: INFO :: _add_task : Informed scheduler that task   MetadataNormalization_data_greater_london_user_3e178c4ed0   has status   PENDING
DEBUG: Checking if EditorCountByUser(datarep=data, n_top_editor=5) is complete
2018-07-11 13:02:28,322 :: DEBUG :: check_complete : Checking if EditorCountByUser(datarep=data, n_top_editor=5) is complete
DEBUG: Checking if UserMetadataExtract(datarep=data, dsname=greater-london) is complete
2018-07-11 13:02:28,324 :: DEBUG :: check_complete : Checking if UserMetadataExtract(datarep=data, dsname=greater-london) is complete
INFO: Informed scheduler that task   AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6   has status   PENDING
2018-07-11 13:02:28,330 :: INFO :: _add_task : Informed scheduler that task   AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6   has status   PENDING
INFO: Informed scheduler that task   UserMetadataExtract_data_greater_london_384d14cdba   has status   DONE
2018-07-11 13:02:28,332 :: INFO :: _add_task : Informed scheduler that task   UserMetadataExtract_data_greater_london_384d14cdba   has status   DONE
DEBUG: Checking if TopMostUsedEditors(datarep=data) is complete
2018-07-11 13:02:28,334 :: DEBUG :: check_complete : Checking if TopMostUsedEditors(datarep=data) is complete
INFO: Informed scheduler that task   EditorCountByUser_data_5_2262508398   has status   PENDING
2018-07-11 13:02:28,340 :: INFO :: _add_task : Informed scheduler that task   EditorCountByUser_data_5_2262508398   has status   PENDING
INFO: Informed scheduler that task   TopMostUsedEditors_data_c4ab8ddf6b   has status   DONE
2018-07-11 13:02:28,342 :: INFO :: _add_task : Informed scheduler that task   TopMostUsedEditors_data_c4ab8ddf6b   has status   DONE
INFO: Informed scheduler that task   OSMElementEnrichment_data_greater_london_384d14cdba   has status   DONE
2018-07-11 13:02:28,343 :: INFO :: _add_task : Informed scheduler that task   OSMElementEnrichment_data_greater_london_384d14cdba   has status   DONE
INFO: Informed scheduler that task   PlottingPCAFeatureContributions_data_greater_london_user_36034479b3   has status   PENDING
2018-07-11 13:02:28,345 :: INFO :: _add_task : Informed scheduler that task   PlottingPCAFeatureContributions_data_greater_london_user_36034479b3   has status   PENDING
DEBUG: Checking if PlottingVarianceAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
2018-07-11 13:02:28,350 :: DEBUG :: check_complete : Checking if PlottingVarianceAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
INFO: Informed scheduler that task   AutoPCA_data_greater_london__781ce5a98f   has status   PENDING
2018-07-11 13:02:28,354 :: INFO :: _add_task : Informed scheduler that task   AutoPCA_data_greater_london__781ce5a98f   has status   PENDING
INFO: Informed scheduler that task   PlottingVarianceAnalysis_data_greater_london__781ce5a98f   has status   PENDING
2018-07-11 13:02:28,355 :: INFO :: _add_task : Informed scheduler that task   PlottingVarianceAnalysis_data_greater_london__781ce5a98f   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_40e76c091f   has status   PENDING
2018-07-11 13:02:28,358 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_40e76c091f   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_822025df00   has status   PENDING
2018-07-11 13:02:28,360 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_822025df00   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_6c413d58ee   has status   PENDING
2018-07-11 13:02:28,366 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_6c413d58ee   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_8743fe0f38   has status   PENDING
2018-07-11 13:02:28,368 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_8743fe0f38   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_accdfd39aa   has status   PENDING
2018-07-11 13:02:28,370 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_accdfd39aa   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_0aac9b95a7   has status   PENDING
2018-07-11 13:02:28,375 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_0aac9b95a7   has status   PENDING
INFO: Informed scheduler that task   KMeansFromPCA_data_greater_london_user_06412000de   has status   PENDING
2018-07-11 13:02:28,377 :: INFO :: _add_task : Informed scheduler that task   KMeansFromPCA_data_greater_london_user_06412000de   has status   PENDING
INFO: Done scheduling tasks
2018-07-11 13:02:28,378 :: INFO :: _schedule_and_run : Done scheduling tasks
INFO: Running Worker with 1 processes
2018-07-11 13:02:28,379 :: INFO :: run : Running Worker with 1 processes
DEBUG: Asking scheduler for work...
2018-07-11 13:02:28,380 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 19
2018-07-11 13:02:28,386 :: DEBUG :: run : Pending tasks: 19
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   EditorCountByUser(datarep=data, n_top_editor=5)
2018-07-11 13:02:28,387 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   EditorCountByUser(datarep=data, n_top_editor=5)
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done      EditorCountByUser(datarep=data, n_top_editor=5)
2018-07-11 13:03:52,192 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done      EditorCountByUser(datarep=data, n_top_editor=5)
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:03:52,197 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   EditorCountByUser_data_5_2262508398   has status   DONE
2018-07-11 13:03:52,198 :: INFO :: _add_task : Informed scheduler that task   EditorCountByUser_data_5_2262508398   has status   DONE
DEBUG: Asking scheduler for work...
2018-07-11 13:03:52,199 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 18
2018-07-11 13:03:52,201 :: DEBUG :: run : Pending tasks: 18
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
2018-07-11 13:03:52,201 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done      AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
2018-07-11 13:03:54,158 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done      AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:03:54,161 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6   has status   DONE
2018-07-11 13:03:54,164 :: INFO :: _add_task : Informed scheduler that task   AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6   has status   DONE
DEBUG: Asking scheduler for work...
2018-07-11 13:03:54,165 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 17
2018-07-11 13:03:54,166 :: DEBUG :: run : Pending tasks: 17
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
2018-07-11 13:03:54,167 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
C:\Python36_64\lib\site-packages\numpy\lib\arraysetops.py:472: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done      MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
2018-07-11 13:04:15,044 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done      MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:04:15,047 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   MetadataNormalization_data_greater_london_user_3e178c4ed0   has status   DONE
2018-07-11 13:04:15,050 :: INFO :: _add_task : Informed scheduler that task   MetadataNormalization_data_greater_london_user_3e178c4ed0   has status   DONE
DEBUG: Asking scheduler for work...
2018-07-11 13:04:15,051 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 16
2018-07-11 13:04:15,053 :: DEBUG :: run : Pending tasks: 16
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
2018-07-11 13:04:15,053 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running   VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
F:\osm-data-classification-master\src\unsupervised_learning.py:35: RuntimeWarning: Degrees of freedom <= 0 for slice
  cov_mat = np.cov(X.T)
C:\Python36_64\lib\site-packages\numpy\lib\function_base.py:3109: RuntimeWarning: divide by zero encountered in double_scalars
  c *= 1. / np.float64(fact)
C:\Python36_64\lib\site-packages\numpy\lib\function_base.py:3109: RuntimeWarning: invalid value encountered in multiply
  c *= 1. / np.float64(fact)
ERROR: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) failed    VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
Traceback (most recent call last):
  File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 203, in run
    new_deps = self._run_get_new_deps()
  File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 140, in _run_get_new_deps
    task_gen = self.task.run()
  File "F:\osm-data-classification-master\src\analysis_tasks.py", line 526, in run
    var_analysis = ul.compute_pca_variance(X)
  File "F:\osm-data-classification-master\src\unsupervised_learning.py", line 36, in compute_pca_variance
    eig_vals, eig_vecs = np.linalg.eig(cov_mat)
  File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 1143, in eig
    _assertFinite(a)
  File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 216, in _assertFinite
    raise LinAlgError("Array must not contain infs or NaNs")
numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs
2018-07-11 13:04:15,309 :: ERROR :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) failed    VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
Traceback (most recent call last):
  File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 203, in run
    new_deps = self._run_get_new_deps()
  File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 140, in _run_get_new_deps
    task_gen = self.task.run()
  File "F:\osm-data-classification-master\src\analysis_tasks.py", line 526, in run
    var_analysis = ul.compute_pca_variance(X)
  File "F:\osm-data-classification-master\src\unsupervised_learning.py", line 36, in compute_pca_variance
    eig_vals, eig_vecs = np.linalg.eig(cov_mat)
  File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 1143, in eig
    _assertFinite(a)
  File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 216, in _assertFinite
    raise LinAlgError("Array must not contain infs or NaNs")
numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:04:15,369 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task   VarianceAnalysisTask_data_greater_london__1f9fa48625   has status   FAILED
2018-07-11 13:04:15,385 :: INFO :: _add_task : Informed scheduler that task   VarianceAnalysisTask_data_greater_london__1f9fa48625   has status   FAILED
DEBUG: Asking scheduler for work...
2018-07-11 13:04:15,387 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Done
2018-07-11 13:04:15,391 :: DEBUG :: _log_remote_tasks : Done
DEBUG: There are no more tasks to run at this time
2018-07-11 13:04:15,394 :: DEBUG :: _log_remote_tasks : There are no more tasks to run at this time
DEBUG: There are 16 pending tasks possibly being run by other workers
2018-07-11 13:04:15,396 :: DEBUG :: _log_remote_tasks : There are 16 pending tasks possibly being run by other workers
DEBUG: There are 16 pending tasks unique to this worker
2018-07-11 13:04:15,399 :: DEBUG :: _log_remote_tasks : There are 16 pending tasks unique to this worker
DEBUG: There are 16 pending tasks last scheduled by this worker
2018-07-11 13:04:15,400 :: DEBUG :: _log_remote_tasks : There are 16 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) was stopped. Shutting down Keep-Alive thread
2018-07-11 13:04:15,402 :: INFO :: run : Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====

Scheduled 22 tasks of which:
* 3 present dependencies were encountered:
    - 1 OSMElementEnrichment(datarep=data, dsname=greater-london)
    - 1 TopMostUsedEditors(datarep=data)
    - 1 UserMetadataExtract(datarep=data, dsname=greater-london)
* 3 ran successfully:
    - 1 AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
    - 1 EditorCountByUser(datarep=data, n_top_editor=5)
    - 1 MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
* 1 failed:
    - 1 VarianceAnalysisTask(...)
* 15 were left pending, among these:
    * 15 had failed dependencies:
        - 1 AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        - 1 AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
        - 1 KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        - 8 KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2...9)
        - 1 KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        ...

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

2018-07-11 13:04:15,433 :: INFO :: _schedule_and_run :
===== Luigi Execution Summary =====

Scheduled 22 tasks of which:
* 3 present dependencies were encountered:
    - 1 OSMElementEnrichment(datarep=data, dsname=greater-london)
    - 1 TopMostUsedEditors(datarep=data)
    - 1 UserMetadataExtract(datarep=data, dsname=greater-london)
* 3 ran successfully:
    - 1 AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
    - 1 EditorCountByUser(datarep=data, n_top_editor=5)
    - 1 MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
* 1 failed:
    - 1 VarianceAnalysisTask(...)
* 15 were left pending, among these:
    * 15 had failed dependencies:
        - 1 AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        - 1 AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
        - 1 KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        - 8 KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2...9)
        - 1 KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
        ...

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====
delhomer commented 6 years ago

Ok, first I've pushed a small PR to correct your encoding issue (see PR #3 ). May you run the code on the corresponding branch to verify the encoding point? Do not hesitate to open a separate issue if there is a new problem later (and even to provide some PRs as well :) ).

Then be careful to the file extension. We indeed need osh.pbf files, as they refers to "history" data, whilst osm.pbf are standard OSM data (pictures of OSM API at a specific date). You should (must!) get the valid OSM dataset before to run the pipeline.

delhomer commented 6 years ago

Oh, Geofabrik seems to have modified its dowloading policy since GDPR. You have to get an OSM contributor account and to log in on Geofabrik website to be able to download osh.pbf. I will update the project readme accordingly.

gangothri1329 commented 6 years ago

Firstly, The PR #3 did the trick in Windows. and downloading osh.pbf instead of osm.pbf works.!

Here is the Luigi execution summary,


Scheduled 23 tasks of which:
* 1 present dependencies were encountered:
    - 1 EditorCountByUser(datarep=data, n_top_editor=5)
* 22 ran successfully:
    - 1 AddExtraInfoUserMetadata(datarep=data, dsname=antarctica-internal, n_top_editor=5)
    - 1 AutoKMeans(datarep=data, dsname=antarctica-internal, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
    - 1 AutoPCA(datarep=data, dsname=antarctica-internal, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
    - 1 ChangeSetMetadataExtract(datarep=data, dsname=antarctica-internal)
    - 1 KMeansAnalysis(datarep=data, dsname=antarctica-internal, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
    ...

This progress looks :) because there were no failed tasks or missing external dependencies

===== Luigi Execution Summary =====

Thanks for your prompt response :)