Closed suremangood closed 1 month ago
Did you download and extract the Geolife dataset as instructed in the Readme?
If you did, please provide me with a proper error trace in a code block and not as an image so that I can look into it. Also, as this error is happening within the multiprocessing code, could you deactivate the multiprocessing and run the method directly to potentially get a more helpful trace?
Sorry for taking so long to reply to you, I didn't expect you to be so prompt. I did download the geolife data in the readme file. I will show you the error code part with pictures.
![Uploading 屏幕截图 2024-04-11 100013.png…]()
Could you place copy and paste the stack trace as text? It's very inconvenient to work with pictures.
Reading Files: 0%| | 0/182 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
Traceback (most recent call last):
File "D:\app\anaconda\envs\roapt_model\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 107, in _process_user
tid = re.search(r'.*/Trajectory/([0-9]*)\.plt', str(file)).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 209, in <module>
get_geolife_trajectories()
File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 163, in get_geolife_trajectories
trajs: List[pd.DataFrame] = get_geolife()
File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 123, in get_geolife
for r in tqdm(pool.imap(_process_user, uids, chunksize=1), total=len(uids), desc='Reading Files'):
File "D:\app\anaconda\envs\roapt_model\lib\site-packages\tqdm\std.py", line 1195, in __iter__
for obj in iterable:
File "D:\app\anaconda\envs\roapt_model\lib\multiprocessing\pool.py", line 870, in next
raise value
AttributeError: 'NoneType' object has no attribute 'group'
The above shows that the error occurs in line 107 of the code
The problem appears to be that re.search(r'.*/Trajectory/([0-9]*)\.plt', str(file))
return None
. Then, the line executes None.group(1)
which doesn't work, of course.
Could you please verify that the directory structure is correct and the files have not been renamed?
The regex */Trajectory/([0-9]*)\.plt
does not match your files.
Could you send me a screenshot of your directory structure?
It should look like this:
data/
├──geolife/
│ ├──Data/
│ │ ├──000/
│ │ │ ├──Trajectory/
│ │ │ │ ├──20081023025304.plt
│ │ │ │ ├──...
│ │ ├──...
│ ├──User Guide-1.3.pdf
Does it?
Is this so?
What is the content of the Trajectory directories? If this is your entire structure, where are the data files (*.plt
).
Did you do a mistake when unzipping the archive potentially?
But the Trajectory file can be opened, and it contains data files ending with .plt suffix. However, it does not display when I use the tree terminal command.
Please see, this is the data file in the Trajectory file on my computer
Somehow, the code cannot find the files. Probably, there is an incorrect path somewhere in the code.
Could you please add to this file: preprocessing.geolife (https://github.com/erik-buchholz/RAoPT/blob/main/raopt/preprocessing/geolife.py)
after line 103 a statement print(tdir)
and verify that this is the path to the Data/
directory? In your case it should besomething/prerocessing/data/geolife/Data
If this is not the case, make sure to update
config.ini
so that the path is correct (line 56 in https://github.com/erik-buchholz/RAoPT/blob/main/config/config.ini)
The path is relative from the base directory of the repository.
I assume the problem is that you downloaded the dataset into preprocessing/data/
instead of data/
.
Alternatively, you could move your unzipped directory into the main data/
directory.
I.e., assume you cloned the repo into RAoPT/
Then, it should look like this:
RAoPT
├── config
│ ├── ...
├── data
│ ├── geolife
| ├── Data
├── ...
├── environment
│ ├── ...
├── LICENCE
├── print_results.py
├── raopt
│ ├── ...
Does that make sense?
Thank you for your patient guidance, I will try it.
Reading Files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 558.31it/s]
Traceback (most recent call last):
File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 210, in
I placed the data file in the home directory but the new error I sent you above appeared.
What command are you using to start the preprocessing?
Script used in terminal: python -m raopt.preprocessing.geolife
This is the output of executing print(tdir)
And when you execute a python -m raopt.preprocessing.geolife
are you located in the repository's main directory?
Something appears to be wrong with your paths. If you look at your screenshot, there is a /
missing between data
and 000
. But if you look into config.ini
, you see that the part is DATASET_PATH = data/geolife/data/
so the /
is still there. Can you check at the top of geolife.py to see if the variable data_dir
still ends with /
? If not, try to add it manually.
For example, your could replace
tdir = data_dir + f"{uid:03d}/Trajectory/"
by
tdir = data_dir + f"/{uid:03d}/Trajectory/"
I tried using your method and added / to the path of geolife in config.ini. I ran the script on the main storage directory and still got the following error: (roapt_model) F:\RAoPT-main>python -m raopt.preprocessing.geolife Reading Files: 0%| | 0/182 [00:00<?, ?it/s]d ata/geolife/data/000/Trajectory/ data/geolife/data/001/Trajectory/ Reading Files: 0%| | 0/182 [00:00<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "D:\app\anaconda\envs\roapt_model\lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 108, in _process_user tid = re.search(r'./Trajectory/([0-9]*).plt', str(file)).group(1) AttributeError: 'NoneType' object has no attribute 'group' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 210, in
Then I ran geolife.py directly on pycharm and an error like this appeared.
Reading Files: 100%|██████████| 182/182 [00:00<00:00, 935.75it/s]
data/geolife/data/000/Trajectory/
...
data/geolife/data/173/Trajectory/
data/geolife/data/174/Trajectory/
data/geolife/data/175/Trajectory/
data/geolife/data/176/Trajectory/
data/geolife/data/177/Trajectory/
data/geolife/data/178/Trajectory/
data/geolife/data/179/Trajectory/
data/geolife/data/180/Trajectory/
data/geolife/data/181/Trajectory/
Traceback (most recent call last):
File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 210, in
Process finished with exit code 1
Clearly, this issue is that there is something wrong with the paths as the files are not found but the script. Unfortunately, I cannot reproduce the issue, on both my Mac and my Ubuntu machine, the voice works as is.
Are you running the code on windows? If so, that might be the problem. The paths on windows are formatted differently, so you would have to change all paths correspondingly. I have never tested this code on windows so it might not work at all.
If you are using Mac/Linux and it doesn't work, please try adding a few debug statements _process_user
. Try to find out why the regex returns None
. Maybe call it in a separate python file and isolate the method. I'm afraid I don't have enough information to provide you with a solution at the moment.
Sorry to bother you again. Not willing to stop there. So after a while, I used Microsoft's ubuntu subsystem to run your code. During training, I executed the script: python3 -m raopt.ml.train -b 512 -e 200 -l 0.001 -s 20 tmp/example/ train_o.csv tmp/example/train_p.csv tmp/example/parameters.hdf5 100 and found that I was missing the file parameters.hdf5. What do you think is going on?
This is the execution process:(raopt)
$ python3 -m raopt.ml.train -b 512 -e 200 -l 0.001 -s 20 tmp/example/train_o.csv tmp/example/train_p.csv tmp/example/parameters.hdf5 100 Using GPU 0! 2024-04-24 19:36:05.368970: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory [2024-04-24 19:36:07,248][INFO ] Loading Data... (train.py:62) [2024-04-24 19:36:07,249][INFO ] Reading Trajectories from tmp/example/train_o.csv. (helpers.py:279) [2024-04-24 19:36:17,060][INFO ] Reading Trajectories from tmp/example/train_p.csv. (helpers.py:279) [2024-04-24 19:36:27,550][INFO ] Compute Parameters... (train.py:70) [2024-04-24 19:36:27,561][INFO ] Reading Trajectories from tmp/example/test_p.csv. (helpers.py:279) [2024-04-24 19:36:41,501][INFO ] Reference Point: (39.94, 116.43) (train.py:76) [2024-04-24 19:36:54,091][INFO ] Scale Factor: (6.01, 7.55) (train.py:78) [2024-04-24 19:36:54,101][INFO ] Encoding trajectories... (encoder.py:148) [2024-04-24 19:37:04,399][INFO ] Encoded trajectories in 10s. (encoder.py:160) [2024-04-24 19:37:04,403][INFO ] Encoding trajectories... (encoder.py:148) [2024-04-24 19:37:16,659][INFO ] Encoded trajectories in 12s. (encoder.py:160)
Hi,
You trace doesn't show the error. But the train.py doesn't need the parameters, as it will create them. Maybe the directory contains the file isn't existing?
Create tmp/example/
and try again. Otherwise show the full error trace.
This is the result of running train.py using the script. Do you see any problems?
[2024-04-25 09:15:06,981][INFO ] Model Summary: (train.py:102)
Model: "RAoPT"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input_Encoding (InputLayer) [(None, 100, 33)] 0
__________________________________________________________________________________________________
Mask_Padding (Masking) (None, 100, 33) 0 Input_Encoding[0][0]
__________________________________________________________________________________________________
tf.split (TFOpLambda) [(None, 100, 2), (No 0 Mask_Padding[0][0]
__________________________________________________________________________________________________
Embedding_latlon (TimeDistribut (None, 100, 64) 192 tf.split[0][0]
__________________________________________________________________________________________________
Embedding_hour (TimeDistributed (None, 100, 24) 600 tf.split[0][1]
__________________________________________________________________________________________________
Embedding_dow (TimeDistributed) (None, 100, 7) 56 tf.split[0][2]
__________________________________________________________________________________________________
Join_Features (Concatenate) (None, 100, 95) 0 Embedding_latlon[0][0]
Embedding_hour[0][0]
Embedding_dow[0][0]
__________________________________________________________________________________________________
Feature_Fusion (Dense) (None, 100, 100) 9600 Join_Features[0][0]
__________________________________________________________________________________________________
Bidirectional_LSTM (Bidirection (None, 100, 200) 160800 Feature_Fusion[0][0]
__________________________________________________________________________________________________
Output_lat (TimeDistributed) (None, 100, 1) 201 Bidirectional_LSTM[0][0]
__________________________________________________________________________________________________
Output_lon (TimeDistributed) (None, 100, 1) 201 Bidirectional_LSTM[0][0]
__________________________________________________________________________________________________
Output_lat_scaled (TimeDistribu (None, 100, 1) 0 Output_lat[0][0]
__________________________________________________________________________________________________
Output_lon_scaled (TimeDistribu (None, 100, 1) 0 Output_lon[0][0]
__________________________________________________________________________________________________
Output_Concatenation (Concatena (None, 100, 2) 0 Output_lat_scaled[0][0]
Output_lon_scaled[0][0]
==================================================================================================
Total params: 171,650
Trainable params: 171,650
Non-trainable params: 0
__________________________________________________________________________________________________
None
0epoch [00:00, ?epoch/s][1] 882 killed python3 -m raopt.ml.train -b 512 -e 200 -l 0.001 -s 20 tmp/example/train_o.cstch [00:00, ?batch/s]
No, that looks good, doesn't it? Is the model training?
One side note, if you are working on trajectories in general. We also published a follow up work of this with multiple generative models: https://github.com/erik-buchholz/SoK-TrajGen
While it still uses RAoPT as it is in this repo, we provide the other models in PyTorch, which is (from my experience) a bit more beginner-friendly. This repo also includes the pre-processed datasets, so you won't need to worry about that again. :)
@suremangood Please let me know once you got everything to work so I can mark this issue as completed. :)
@suremangood
For the future, I have added the processed datasets via Git LFS so that the data can be used without preprocessing if required.
They are contained in processed_csv/
I ran a code called geolife and I got an attributeError: 'NoneType' object has no attribute 'group'.