erik-buchholz / RAoPT

Reconstruction Attack on Differential Private Trajectory Protection Mechanisms
Other
15 stars 6 forks source link

geolife.py error #1

Closed suremangood closed 1 month ago

suremangood commented 7 months ago

I ran a code called geolife and I got an attributeError: 'NoneType' object has no attribute 'group'.

suremangood commented 7 months ago

123

erik-buchholz commented 7 months ago

Did you download and extract the Geolife dataset as instructed in the Readme?

If you did, please provide me with a proper error trace in a code block and not as an image so that I can look into it. Also, as this error is happening within the multiprocessing code, could you deactivate the multiprocessing and run the method directly to potentially get a more helpful trace?

suremangood commented 7 months ago

Sorry for taking so long to reply to you, I didn't expect you to be so prompt. I did download the geolife data in the readme file. I will show you the error code part with pictures.

suremangood commented 7 months ago

![Uploading 屏幕截图 2024-04-11 100013.png…]()

erik-buchholz commented 7 months ago

Could you place copy and paste the stack trace as text? It's very inconvenient to work with pictures.

suremangood commented 7 months ago
Reading Files:   0%|          | 0/182 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
Traceback (most recent call last):
  File "D:\app\anaconda\envs\roapt_model\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 107, in _process_user
    tid = re.search(r'.*/Trajectory/([0-9]*)\.plt', str(file)).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 209, in <module>
    get_geolife_trajectories()
  File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 163, in get_geolife_trajectories
    trajs: List[pd.DataFrame] = get_geolife()
  File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 123, in get_geolife
    for r in tqdm(pool.imap(_process_user, uids, chunksize=1), total=len(uids), desc='Reading Files'):
  File "D:\app\anaconda\envs\roapt_model\lib\site-packages\tqdm\std.py", line 1195, in __iter__
    for obj in iterable:
  File "D:\app\anaconda\envs\roapt_model\lib\multiprocessing\pool.py", line 870, in next
    raise value
AttributeError: 'NoneType' object has no attribute 'group'
suremangood commented 7 months ago

The above shows that the error occurs in line 107 of the code

erik-buchholz commented 7 months ago

The problem appears to be that re.search(r'.*/Trajectory/([0-9]*)\.plt', str(file)) return None. Then, the line executes None.group(1) which doesn't work, of course.

Could you please verify that the directory structure is correct and the files have not been renamed? The regex */Trajectory/([0-9]*)\.plt does not match your files. Could you send me a screenshot of your directory structure? It should look like this:

data/
├──geolife/
│  ├──Data/
│  │  ├──000/
│  │  │  ├──Trajectory/
│  │  │  │  ├──20081023025304.plt
│  │  │  │  ├──...
│  │  ├──...
│  ├──User Guide-1.3.pdf

Does it?

suremangood commented 7 months ago

picture2 2024-04-11 101942

suremangood commented 7 months ago

Is this so?

erik-buchholz commented 7 months ago

What is the content of the Trajectory directories? If this is your entire structure, where are the data files (*.plt). Did you do a mistake when unzipping the archive potentially?

suremangood commented 7 months ago

But the Trajectory file can be opened, and it contains data files ending with .plt suffix. However, it does not display when I use the tree terminal command.

suremangood commented 7 months ago

picture3 2024-04-11 103222

suremangood commented 7 months ago

Please see, this is the data file in the Trajectory file on my computer

erik-buchholz commented 7 months ago

Somehow, the code cannot find the files. Probably, there is an incorrect path somewhere in the code. Could you please add to this file: preprocessing.geolife (https://github.com/erik-buchholz/RAoPT/blob/main/raopt/preprocessing/geolife.py) after line 103 a statement print(tdir) and verify that this is the path to the Data/ directory? In your case it should besomething/prerocessing/data/geolife/Data If this is not the case, make sure to update config.ini so that the path is correct (line 56 in https://github.com/erik-buchholz/RAoPT/blob/main/config/config.ini)

The path is relative from the base directory of the repository. I assume the problem is that you downloaded the dataset into preprocessing/data/ instead of data/.

erik-buchholz commented 7 months ago

Alternatively, you could move your unzipped directory into the main data/ directory. I.e., assume you cloned the repo into RAoPT/ Then, it should look like this:

RAoPT
├── config
│   ├── ...
├── data
│   ├── geolife
|         ├── Data
                ├── ...
├── environment
│   ├── ...
├── LICENCE
├── print_results.py
├── raopt
│   ├── ...

Does that make sense?

suremangood commented 7 months ago

Thank you for your patient guidance, I will try it.

suremangood commented 7 months ago

Reading Files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 558.31it/s] Traceback (most recent call last): File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 210, in get_geolife_trajectories() File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 164, in get_geolife_trajectories trajs: List[pd.DataFrame] = get_geolife() File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 128, in get_geolife trajs = _clean_trajectories(trajs) File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 76, in _clean_trajectories bbox = find_bbox(trajs, 0.95) File "F:\RAoPT-main\raopt\utils\helpers.py", line 135, in find_bbox single_db = pd.concat(trajs) File "D:\app\anaconda\envs\roapt_model\lib\site-packages\pandas\util_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "D:\app\anaconda\envs\roapt_model\lib\site-packages\pandas\core\reshape\concat.py", line 347, in concat op = _Concatenator( File "D:\app\anaconda\envs\roapt_model\lib\site-packages\pandas\core\reshape\concat.py", line 404, in init raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

suremangood commented 7 months ago

I placed the data file in the home directory but the new error I sent you above appeared.

suremangood commented 7 months ago

屏幕截图 2024-04-11 111143

erik-buchholz commented 7 months ago

What command are you using to start the preprocessing?

suremangood commented 7 months ago

Script used in terminal: python -m raopt.preprocessing.geolife

suremangood commented 7 months ago

屏幕截图 2024-04-11 110125

suremangood commented 7 months ago

This is the output of executing print(tdir)

erik-buchholz commented 7 months ago

And when you execute a python -m raopt.preprocessing.geolife are you located in the repository's main directory?

Something appears to be wrong with your paths. If you look at your screenshot, there is a / missing between data and 000. But if you look into config.ini, you see that the part is DATASET_PATH = data/geolife/data/so the / is still there. Can you check at the top of geolife.py to see if the variable data_dir still ends with /? If not, try to add it manually.

For example, your could replace

tdir = data_dir + f"{uid:03d}/Trajectory/"

by

tdir = data_dir + f"/{uid:03d}/Trajectory/"
suremangood commented 7 months ago

I tried using your method and added / to the path of geolife in config.ini. I ran the script on the main storage directory and still got the following error: (roapt_model) F:\RAoPT-main>python -m raopt.preprocessing.geolife Reading Files: 0%| | 0/182 [00:00<?, ?it/s]d ata/geolife/data/000/Trajectory/ data/geolife/data/001/Trajectory/ Reading Files: 0%| | 0/182 [00:00<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "D:\app\anaconda\envs\roapt_model\lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 108, in _process_user tid = re.search(r'./Trajectory/([0-9]*).plt', str(file)).group(1) AttributeError: 'NoneType' object has no attribute 'group' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\app\anaconda\envs\roapt_model\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 210, in get_geolife_trajectories() File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 164, in get_geolife_trajectories trajs: List[pd.DataFrame] = get_geolife() File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 124, in get_geolife for r in tqdm(pool.imap(_process_user, uids, chunksize=1), total=len(uids), desc='Reading Files'): File "D:\app\anaconda\envs\roapt_model\lib\site-packages\tqdm\std.py", line 1195, in iter for obj in iterable: File "D:\app\anaconda\envs\roapt_model\lib\multiprocessing\pool.py", line 870, in next raise value AttributeError: 'NoneType' object has no attribute 'group'

suremangood commented 7 months ago

Then I ran geolife.py directly on pycharm and an error like this appeared. Reading Files: 100%|██████████| 182/182 [00:00<00:00, 935.75it/s] data/geolife/data/000/Trajectory/ ... data/geolife/data/173/Trajectory/ data/geolife/data/174/Trajectory/ data/geolife/data/175/Trajectory/ data/geolife/data/176/Trajectory/ data/geolife/data/177/Trajectory/ data/geolife/data/178/Trajectory/ data/geolife/data/179/Trajectory/ data/geolife/data/180/Trajectory/ data/geolife/data/181/Trajectory/ Traceback (most recent call last): File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 210, in get_geolife_trajectories() File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 164, in get_geolife_trajectories trajs: List[pd.DataFrame] = get_geolife() File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 128, in get_geolife trajs = _clean_trajectories(trajs) File "F:\RAoPT-main\raopt\preprocessing\geolife.py", line 76, in _clean_trajectories bbox = find_bbox(trajs, 0.95) File "F:\RAoPT-main\raopt\utils\helpers.py", line 135, in find_bbox single_db = pd.concat(trajs) File "D:\app\anaconda\envs\roapt_model\lib\site-packages\pandas\util_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "D:\app\anaconda\envs\roapt_model\lib\site-packages\pandas\core\reshape\concat.py", line 347, in concat op = _Concatenator( File "D:\app\anaconda\envs\roapt_model\lib\site-packages\pandas\core\reshape\concat.py", line 404, in init raise ValueError("No objects to concatenate") ValueError: No objects to concatenate

Process finished with exit code 1

erik-buchholz commented 7 months ago

Clearly, this issue is that there is something wrong with the paths as the files are not found but the script. Unfortunately, I cannot reproduce the issue, on both my Mac and my Ubuntu machine, the voice works as is.

Are you running the code on windows? If so, that might be the problem. The paths on windows are formatted differently, so you would have to change all paths correspondingly. I have never tested this code on windows so it might not work at all.

If you are using Mac/Linux and it doesn't work, please try adding a few debug statements _process_user. Try to find out why the regex returns None. Maybe call it in a separate python file and isolate the method. I'm afraid I don't have enough information to provide you with a solution at the moment.

suremangood commented 6 months ago

Sorry to bother you again. Not willing to stop there. So after a while, I used Microsoft's ubuntu subsystem to run your code. During training, I executed the script: python3 -m raopt.ml.train -b 512 -e 200 -l 0.001 -s 20 tmp/example/ train_o.csv tmp/example/train_p.csv tmp/example/parameters.hdf5 100 and found that I was missing the file parameters.hdf5. What do you think is going on?

suremangood commented 6 months ago

This is the execution process:(raopt)

root @ LAPTOP-O1O74SAO in /mnt/c/Users/18669/Desktop/RAoPT-main [19:33:21] C:1

$ python3 -m raopt.ml.train -b 512 -e 200 -l 0.001 -s 20 tmp/example/train_o.csv tmp/example/train_p.csv tmp/example/parameters.hdf5 100 Using GPU 0! 2024-04-24 19:36:05.368970: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory [2024-04-24 19:36:07,248][INFO ] Loading Data... (train.py:62) [2024-04-24 19:36:07,249][INFO ] Reading Trajectories from tmp/example/train_o.csv. (helpers.py:279) [2024-04-24 19:36:17,060][INFO ] Reading Trajectories from tmp/example/train_p.csv. (helpers.py:279) [2024-04-24 19:36:27,550][INFO ] Compute Parameters... (train.py:70) [2024-04-24 19:36:27,561][INFO ] Reading Trajectories from tmp/example/test_p.csv. (helpers.py:279) [2024-04-24 19:36:41,501][INFO ] Reference Point: (39.94, 116.43) (train.py:76) [2024-04-24 19:36:54,091][INFO ] Scale Factor: (6.01, 7.55) (train.py:78) [2024-04-24 19:36:54,101][INFO ] Encoding trajectories... (encoder.py:148) [2024-04-24 19:37:04,399][INFO ] Encoded trajectories in 10s. (encoder.py:160) [2024-04-24 19:37:04,403][INFO ] Encoding trajectories... (encoder.py:148) [2024-04-24 19:37:16,659][INFO ] Encoded trajectories in 12s. (encoder.py:160)

erik-buchholz commented 6 months ago

Hi,

You trace doesn't show the error. But the train.py doesn't need the parameters, as it will create them. Maybe the directory contains the file isn't existing? Create tmp/example/ and try again. Otherwise show the full error trace.

suremangood commented 6 months ago

This is the result of running train.py using the script. Do you see any problems?

[2024-04-25 09:15:06,981][INFO   ] Model Summary: (train.py:102)
Model: "RAoPT"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
Input_Encoding (InputLayer)     [(None, 100, 33)]    0
__________________________________________________________________________________________________
Mask_Padding (Masking)          (None, 100, 33)      0           Input_Encoding[0][0]
__________________________________________________________________________________________________
tf.split (TFOpLambda)           [(None, 100, 2), (No 0           Mask_Padding[0][0]
__________________________________________________________________________________________________
Embedding_latlon (TimeDistribut (None, 100, 64)      192         tf.split[0][0]
__________________________________________________________________________________________________
Embedding_hour (TimeDistributed (None, 100, 24)      600         tf.split[0][1]
__________________________________________________________________________________________________
Embedding_dow (TimeDistributed) (None, 100, 7)       56          tf.split[0][2]
__________________________________________________________________________________________________
Join_Features (Concatenate)     (None, 100, 95)      0           Embedding_latlon[0][0]
                                                                 Embedding_hour[0][0]
                                                                 Embedding_dow[0][0]
__________________________________________________________________________________________________
Feature_Fusion (Dense)          (None, 100, 100)     9600        Join_Features[0][0]
__________________________________________________________________________________________________
Bidirectional_LSTM (Bidirection (None, 100, 200)     160800      Feature_Fusion[0][0]
__________________________________________________________________________________________________
Output_lat (TimeDistributed)    (None, 100, 1)       201         Bidirectional_LSTM[0][0]
__________________________________________________________________________________________________
Output_lon (TimeDistributed)    (None, 100, 1)       201         Bidirectional_LSTM[0][0]
__________________________________________________________________________________________________
Output_lat_scaled (TimeDistribu (None, 100, 1)       0           Output_lat[0][0]
__________________________________________________________________________________________________
Output_lon_scaled (TimeDistribu (None, 100, 1)       0           Output_lon[0][0]
__________________________________________________________________________________________________
Output_Concatenation (Concatena (None, 100, 2)       0           Output_lat_scaled[0][0]
                                                                 Output_lon_scaled[0][0]
==================================================================================================
Total params: 171,650
Trainable params: 171,650
Non-trainable params: 0
__________________________________________________________________________________________________
None
0epoch [00:00, ?epoch/s][1]    882 killed     python3 -m raopt.ml.train -b 512 -e 200 -l 0.001 -s 20 tmp/example/train_o.cstch [00:00, ?batch/s]
erik-buchholz commented 6 months ago

No, that looks good, doesn't it? Is the model training?

erik-buchholz commented 6 months ago

One side note, if you are working on trajectories in general. We also published a follow up work of this with multiple generative models: https://github.com/erik-buchholz/SoK-TrajGen

While it still uses RAoPT as it is in this repo, we provide the other models in PyTorch, which is (from my experience) a bit more beginner-friendly. This repo also includes the pre-processed datasets, so you won't need to worry about that again. :)

erik-buchholz commented 6 months ago

@suremangood Please let me know once you got everything to work so I can mark this issue as completed. :)

erik-buchholz commented 1 month ago

@suremangood

For the future, I have added the processed datasets via Git LFS so that the data can be used without preprocessing if required. They are contained in processed_csv/