Closed gangothri1329 closed 6 years ago
Hi @gangothri1329 ! Thank you for your issue submission, it seems indeed that the project needs a packaging refactoring.
The most informative line in the log is the following one:
RuntimeError: Open failed for 'data\raw\bordeaux-metropole.osh.pbf': The system cannot find the file specified.
Actually, there are several possible explanations there:
bordeaux-metropole
data, as written in the log? Be careful to specify the dataset you want to use with the dsname
arg if you want to focus another place.datarep
arg? The command line should be run from src
repository, maybe your data
folder is not on the same place (I have to remind you that datarep==data
, by default).Do not hesitate to provide more details on your problem (i.e. which command do you run on your terminal?), I will be there for further discussions about it!
Thanks for the prompt reply.
It seems that the osh.pbf
extension is renamed to osm.pbf
. renaming the extension got rid of the problem.
The 295th line in analysis_tasks.py is rewritten as it detects wrong encoding for windows.
with open(osp.join(self.datarep, OUTPUT_DIR, self.editor_fname),encoding="utf-8") as fobj
Still, there arises an error numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs
I have set the PYTHONPATH to src
and running the command line from the folder which contains data
folder and src
folder.
Thank you in advance.
F:\osm-data-classification-master>python -m luigi --local-scheduler --module analysis_tasks AutoKMeans --dsname greater-london
C:\Python36_64\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
2018-07-11 13:02:27,511 :: INFO :: instance : Loaded []
DEBUG: Checking if AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-11 13:02:28,256 :: DEBUG :: check_complete : Checking if AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
DEBUG: Checking if KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-11 13:02:28,258 :: DEBUG :: check_complete : Checking if KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
DEBUG: Checking if KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
2018-07-11 13:02:28,259 :: DEBUG :: check_complete : Checking if KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8) is complete
INFO: Informed scheduler that task AutoKMeans_data_greater_london_user_7d1cf8ed6a has status PENDING
2018-07-11 13:02:28,260 :: INFO :: _add_task : Informed scheduler that task AutoKMeans_data_greater_london_user_7d1cf8ed6a has status PENDING
INFO: Informed scheduler that task KMeansAnalysis_data_greater_london_user_7d1cf8ed6a has status PENDING
2018-07-11 13:02:28,262 :: INFO :: _add_task : Informed scheduler that task KMeansAnalysis_data_greater_london_user_7d1cf8ed6a has status PENDING
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2) is complete
2018-07-11 13:02:28,268 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=3) is complete
2018-07-11 13:02:28,270 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=3) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=4) is complete
2018-07-11 13:02:28,271 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=4) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=5) is complete
2018-07-11 13:02:28,272 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=5) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=6) is complete
2018-07-11 13:02:28,273 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=6) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=7) is complete
2018-07-11 13:02:28,274 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=7) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=8) is complete
2018-07-11 13:02:28,276 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=8) is complete
DEBUG: Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=9) is complete
2018-07-11 13:02:28,277 :: DEBUG :: check_complete : Checking if KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=9) is complete
INFO: Informed scheduler that task KMeansReport_data_greater_london_user_7d1cf8ed6a has status PENDING
2018-07-11 13:02:28,282 :: INFO :: _add_task : Informed scheduler that task KMeansReport_data_greater_london_user_7d1cf8ed6a has status PENDING
DEBUG: Checking if AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
2018-07-11 13:02:28,287 :: DEBUG :: check_complete : Checking if AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
DEBUG: Checking if PlottingPCAFeatureContributions(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
2018-07-11 13:02:28,289 :: DEBUG :: check_complete : Checking if PlottingPCAFeatureContributions(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
DEBUG: Checking if PlottingPCACorrelationCircle(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
2018-07-11 13:02:28,295 :: DEBUG :: check_complete : Checking if PlottingPCACorrelationCircle(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12) is complete
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_732745e15a has status PENDING
2018-07-11 13:02:28,298 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_732745e15a has status PENDING
DEBUG: Checking if VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=) is complete
2018-07-11 13:02:28,301 :: DEBUG :: check_complete : Checking if VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=) is complete
INFO: Informed scheduler that task PlottingPCACorrelationCircle_data_greater_london_user_36034479b3 has status PENDING
2018-07-11 13:02:28,306 :: INFO :: _add_task : Informed scheduler that task PlottingPCACorrelationCircle_data_greater_london_user_36034479b3 has status PENDING
DEBUG: Checking if MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user) is complete
2018-07-11 13:02:28,308 :: DEBUG :: check_complete : Checking if MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user) is complete
INFO: Informed scheduler that task VarianceAnalysisTask_data_greater_london__1f9fa48625 has status PENDING
2018-07-11 13:02:28,310 :: INFO :: _add_task : Informed scheduler that task VarianceAnalysisTask_data_greater_london__1f9fa48625 has status PENDING
DEBUG: Checking if OSMElementEnrichment(datarep=data, dsname=greater-london) is complete
2018-07-11 13:02:28,314 :: DEBUG :: check_complete : Checking if OSMElementEnrichment(datarep=data, dsname=greater-london) is complete
DEBUG: Checking if AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5) is complete
2018-07-11 13:02:28,318 :: DEBUG :: check_complete : Checking if AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5) is complete
INFO: Informed scheduler that task MetadataNormalization_data_greater_london_user_3e178c4ed0 has status PENDING
2018-07-11 13:02:28,320 :: INFO :: _add_task : Informed scheduler that task MetadataNormalization_data_greater_london_user_3e178c4ed0 has status PENDING
DEBUG: Checking if EditorCountByUser(datarep=data, n_top_editor=5) is complete
2018-07-11 13:02:28,322 :: DEBUG :: check_complete : Checking if EditorCountByUser(datarep=data, n_top_editor=5) is complete
DEBUG: Checking if UserMetadataExtract(datarep=data, dsname=greater-london) is complete
2018-07-11 13:02:28,324 :: DEBUG :: check_complete : Checking if UserMetadataExtract(datarep=data, dsname=greater-london) is complete
INFO: Informed scheduler that task AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6 has status PENDING
2018-07-11 13:02:28,330 :: INFO :: _add_task : Informed scheduler that task AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6 has status PENDING
INFO: Informed scheduler that task UserMetadataExtract_data_greater_london_384d14cdba has status DONE
2018-07-11 13:02:28,332 :: INFO :: _add_task : Informed scheduler that task UserMetadataExtract_data_greater_london_384d14cdba has status DONE
DEBUG: Checking if TopMostUsedEditors(datarep=data) is complete
2018-07-11 13:02:28,334 :: DEBUG :: check_complete : Checking if TopMostUsedEditors(datarep=data) is complete
INFO: Informed scheduler that task EditorCountByUser_data_5_2262508398 has status PENDING
2018-07-11 13:02:28,340 :: INFO :: _add_task : Informed scheduler that task EditorCountByUser_data_5_2262508398 has status PENDING
INFO: Informed scheduler that task TopMostUsedEditors_data_c4ab8ddf6b has status DONE
2018-07-11 13:02:28,342 :: INFO :: _add_task : Informed scheduler that task TopMostUsedEditors_data_c4ab8ddf6b has status DONE
INFO: Informed scheduler that task OSMElementEnrichment_data_greater_london_384d14cdba has status DONE
2018-07-11 13:02:28,343 :: INFO :: _add_task : Informed scheduler that task OSMElementEnrichment_data_greater_london_384d14cdba has status DONE
INFO: Informed scheduler that task PlottingPCAFeatureContributions_data_greater_london_user_36034479b3 has status PENDING
2018-07-11 13:02:28,345 :: INFO :: _add_task : Informed scheduler that task PlottingPCAFeatureContributions_data_greater_london_user_36034479b3 has status PENDING
DEBUG: Checking if PlottingVarianceAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
2018-07-11 13:02:28,350 :: DEBUG :: check_complete : Checking if PlottingVarianceAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=) is complete
INFO: Informed scheduler that task AutoPCA_data_greater_london__781ce5a98f has status PENDING
2018-07-11 13:02:28,354 :: INFO :: _add_task : Informed scheduler that task AutoPCA_data_greater_london__781ce5a98f has status PENDING
INFO: Informed scheduler that task PlottingVarianceAnalysis_data_greater_london__781ce5a98f has status PENDING
2018-07-11 13:02:28,355 :: INFO :: _add_task : Informed scheduler that task PlottingVarianceAnalysis_data_greater_london__781ce5a98f has status PENDING
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_40e76c091f has status PENDING
2018-07-11 13:02:28,358 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_40e76c091f has status PENDING
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_822025df00 has status PENDING
2018-07-11 13:02:28,360 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_822025df00 has status PENDING
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_6c413d58ee has status PENDING
2018-07-11 13:02:28,366 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_6c413d58ee has status PENDING
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_8743fe0f38 has status PENDING
2018-07-11 13:02:28,368 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_8743fe0f38 has status PENDING
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_accdfd39aa has status PENDING
2018-07-11 13:02:28,370 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_accdfd39aa has status PENDING
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_0aac9b95a7 has status PENDING
2018-07-11 13:02:28,375 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_0aac9b95a7 has status PENDING
INFO: Informed scheduler that task KMeansFromPCA_data_greater_london_user_06412000de has status PENDING
2018-07-11 13:02:28,377 :: INFO :: _add_task : Informed scheduler that task KMeansFromPCA_data_greater_london_user_06412000de has status PENDING
INFO: Done scheduling tasks
2018-07-11 13:02:28,378 :: INFO :: _schedule_and_run : Done scheduling tasks
INFO: Running Worker with 1 processes
2018-07-11 13:02:28,379 :: INFO :: run : Running Worker with 1 processes
DEBUG: Asking scheduler for work...
2018-07-11 13:02:28,380 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 19
2018-07-11 13:02:28,386 :: DEBUG :: run : Pending tasks: 19
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running EditorCountByUser(datarep=data, n_top_editor=5)
2018-07-11 13:02:28,387 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running EditorCountByUser(datarep=data, n_top_editor=5)
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done EditorCountByUser(datarep=data, n_top_editor=5)
2018-07-11 13:03:52,192 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done EditorCountByUser(datarep=data, n_top_editor=5)
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:03:52,197 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task EditorCountByUser_data_5_2262508398 has status DONE
2018-07-11 13:03:52,198 :: INFO :: _add_task : Informed scheduler that task EditorCountByUser_data_5_2262508398 has status DONE
DEBUG: Asking scheduler for work...
2018-07-11 13:03:52,199 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 18
2018-07-11 13:03:52,201 :: DEBUG :: run : Pending tasks: 18
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
2018-07-11 13:03:52,201 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
2018-07-11 13:03:54,158 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:03:54,161 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6 has status DONE
2018-07-11 13:03:54,164 :: INFO :: _add_task : Informed scheduler that task AddExtraInfoUserMetadata_data_greater_london_5_3ec300f5d6 has status DONE
DEBUG: Asking scheduler for work...
2018-07-11 13:03:54,165 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 17
2018-07-11 13:03:54,166 :: DEBUG :: run : Pending tasks: 17
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
2018-07-11 13:03:54,167 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
C:\Python36_64\lib\site-packages\numpy\lib\arraysetops.py:472: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
mask |= (ar1 == a)
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
2018-07-11 13:04:15,044 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) done MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:04:15,047 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task MetadataNormalization_data_greater_london_user_3e178c4ed0 has status DONE
2018-07-11 13:04:15,050 :: INFO :: _add_task : Informed scheduler that task MetadataNormalization_data_greater_london_user_3e178c4ed0 has status DONE
DEBUG: Asking scheduler for work...
2018-07-11 13:04:15,051 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Pending tasks: 16
2018-07-11 13:04:15,053 :: DEBUG :: run : Pending tasks: 16
INFO: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
2018-07-11 13:04:15,053 :: INFO :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) running VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
F:\osm-data-classification-master\src\unsupervised_learning.py:35: RuntimeWarning: Degrees of freedom <= 0 for slice
cov_mat = np.cov(X.T)
C:\Python36_64\lib\site-packages\numpy\lib\function_base.py:3109: RuntimeWarning: divide by zero encountered in double_scalars
c *= 1. / np.float64(fact)
C:\Python36_64\lib\site-packages\numpy\lib\function_base.py:3109: RuntimeWarning: invalid value encountered in multiply
c *= 1. / np.float64(fact)
ERROR: [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) failed VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
Traceback (most recent call last):
File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 203, in run
new_deps = self._run_get_new_deps()
File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 140, in _run_get_new_deps
task_gen = self.task.run()
File "F:\osm-data-classification-master\src\analysis_tasks.py", line 526, in run
var_analysis = ul.compute_pca_variance(X)
File "F:\osm-data-classification-master\src\unsupervised_learning.py", line 36, in compute_pca_variance
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 1143, in eig
_assertFinite(a)
File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 216, in _assertFinite
raise LinAlgError("Array must not contain infs or NaNs")
numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs
2018-07-11 13:04:15,309 :: ERROR :: run : [pid 20328] Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) failed VarianceAnalysisTask(datarep=data, dsname=greater-london, metadata_type=user, nb_mindimensions=3, nb_maxdimensions=12, features=)
Traceback (most recent call last):
File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 203, in run
new_deps = self._run_get_new_deps()
File "C:\Python36_64\lib\site-packages\luigi\worker.py", line 140, in _run_get_new_deps
task_gen = self.task.run()
File "F:\osm-data-classification-master\src\analysis_tasks.py", line 526, in run
var_analysis = ul.compute_pca_variance(X)
File "F:\osm-data-classification-master\src\unsupervised_learning.py", line 36, in compute_pca_variance
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 1143, in eig
_assertFinite(a)
File "C:\Python36_64\lib\site-packages\numpy\linalg\linalg.py", line 216, in _assertFinite
raise LinAlgError("Array must not contain infs or NaNs")
numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs
DEBUG: 1 running tasks, waiting for next task to finish
2018-07-11 13:04:15,369 :: DEBUG :: run : 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task VarianceAnalysisTask_data_greater_london__1f9fa48625 has status FAILED
2018-07-11 13:04:15,385 :: INFO :: _add_task : Informed scheduler that task VarianceAnalysisTask_data_greater_london__1f9fa48625 has status FAILED
DEBUG: Asking scheduler for work...
2018-07-11 13:04:15,387 :: DEBUG :: _get_work : Asking scheduler for work...
DEBUG: Done
2018-07-11 13:04:15,391 :: DEBUG :: _log_remote_tasks : Done
DEBUG: There are no more tasks to run at this time
2018-07-11 13:04:15,394 :: DEBUG :: _log_remote_tasks : There are no more tasks to run at this time
DEBUG: There are 16 pending tasks possibly being run by other workers
2018-07-11 13:04:15,396 :: DEBUG :: _log_remote_tasks : There are 16 pending tasks possibly being run by other workers
DEBUG: There are 16 pending tasks unique to this worker
2018-07-11 13:04:15,399 :: DEBUG :: _log_remote_tasks : There are 16 pending tasks unique to this worker
DEBUG: There are 16 pending tasks last scheduled by this worker
2018-07-11 13:04:15,400 :: DEBUG :: _log_remote_tasks : There are 16 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) was stopped. Shutting down Keep-Alive thread
2018-07-11 13:04:15,402 :: INFO :: run : Worker Worker(salt=236886626, workers=1, host=LAPTOP-CGLPDH3E, username=Gangothri, pid=20328) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====
Scheduled 22 tasks of which:
* 3 present dependencies were encountered:
- 1 OSMElementEnrichment(datarep=data, dsname=greater-london)
- 1 TopMostUsedEditors(datarep=data)
- 1 UserMetadataExtract(datarep=data, dsname=greater-london)
* 3 ran successfully:
- 1 AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
- 1 EditorCountByUser(datarep=data, n_top_editor=5)
- 1 MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
* 1 failed:
- 1 VarianceAnalysisTask(...)
* 15 were left pending, among these:
* 15 had failed dependencies:
- 1 AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
- 1 AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
- 1 KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
- 8 KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2...9)
- 1 KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
...
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
2018-07-11 13:04:15,433 :: INFO :: _schedule_and_run :
===== Luigi Execution Summary =====
Scheduled 22 tasks of which:
* 3 present dependencies were encountered:
- 1 OSMElementEnrichment(datarep=data, dsname=greater-london)
- 1 TopMostUsedEditors(datarep=data)
- 1 UserMetadataExtract(datarep=data, dsname=greater-london)
* 3 ran successfully:
- 1 AddExtraInfoUserMetadata(datarep=data, dsname=greater-london, n_top_editor=5)
- 1 EditorCountByUser(datarep=data, n_top_editor=5)
- 1 MetadataNormalization(datarep=data, dsname=greater-london, metadata_type=user)
* 1 failed:
- 1 VarianceAnalysisTask(...)
* 15 were left pending, among these:
* 15 had failed dependencies:
- 1 AutoKMeans(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
- 1 AutoPCA(datarep=data, dsname=greater-london, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
- 1 KMeansAnalysis(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
- 8 KMeansFromPCA(datarep=data, dsname=greater-london, metadata_type=user, n_components=0, nb_clusters=2...9)
- 1 KMeansReport(datarep=data, dsname=greater-london, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
...
This progress looks :( because there were failed tasks
===== Luigi Execution Summary =====
Ok, first I've pushed a small PR to correct your encoding issue (see PR #3 ). May you run the code on the corresponding branch to verify the encoding point? Do not hesitate to open a separate issue if there is a new problem later (and even to provide some PRs as well :) ).
Then be careful to the file extension. We indeed need osh.pbf
files, as they refers to "history" data, whilst osm.pbf
are standard OSM data (pictures of OSM API at a specific date). You should (must!) get the valid OSM dataset before to run the pipeline.
Oh, Geofabrik seems to have modified its dowloading policy since GDPR. You have to get an OSM contributor account and to log in on Geofabrik website to be able to download osh.pbf
. I will update the project readme accordingly.
Firstly, The PR #3 did the trick in Windows.
and downloading osh.pbf
instead of osm.pbf
works.!
Here is the Luigi execution summary,
Scheduled 23 tasks of which:
* 1 present dependencies were encountered:
- 1 EditorCountByUser(datarep=data, n_top_editor=5)
* 22 ran successfully:
- 1 AddExtraInfoUserMetadata(datarep=data, dsname=antarctica-internal, n_top_editor=5)
- 1 AutoKMeans(datarep=data, dsname=antarctica-internal, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
- 1 AutoPCA(datarep=data, dsname=antarctica-internal, metadata_type=user, nb_min_dim=3, nb_max_dim=12, features=)
- 1 ChangeSetMetadataExtract(datarep=data, dsname=antarctica-internal)
- 1 KMeansAnalysis(datarep=data, dsname=antarctica-internal, metadata_type=user, nbmin_clusters=3, nbmax_clusters=8)
...
This progress looks :) because there were no failed tasks or missing external dependencies
===== Luigi Execution Summary =====
Thanks for your prompt response :)
Hi,
Thank you so much for the tool!
I have been following the README but still get failed tasks.I've installed the required dependencies but I still get an output with tasks failed or left pending. I am running this on Windows. Thanks in advance.