AMP-SCZ / dpanonymize

Personally identifiable information remover
Apache License 2.0
0 stars 0 forks source link

PII removal of all files under a given directory #9

Closed tashrifbillah closed 3 years ago

tashrifbillah commented 3 years ago

Regarding the section of README with this issue title:

cd /data/predict/tb571_lochness_test/ANON-PHOENIX/PROTECTED/FAKE_LA/raw

Is this command:

dpanon.py \
--in_dir surveys/ \
--out_dir /data/predict/tb571_lochness_test/ANON-PHOENIX/GENERAL/FAKE_LA/processed/surveys \
--datatype survey

intended to anonymize this tree?

surveys/
├── LA06749
│   └── LA06749.FAKE.json
├── LA10782
│   └── LA10782.FAKE.json
├── LA24153
│   └── LA24153.FAKE.json
├── LA27050
│   └── LA27050.FAKE.json
├── LA54257
│   └── LA54257.FAKE.json
└── LA99243
    └── LA99243.FAKE.json

I don't see it doing so.

tashrifbillah commented 3 years ago

This does not work either:

dpanon.py --phoenix_root /data/predict/tb571_lochness_test/ANON-PHOENIX/ --datatype survey
tashrifbillah commented 3 years ago

I see what you did here--you are looking for in_dir/*json. But I know how to modify that glob to find JSONs many directories deep. Happy to discuss over a call.

kcho commented 3 years ago

Regarding the section of README with this issue title:

cd /data/predict/tb571_lochness_test/ANON-PHOENIX/PROTECTED/FAKE_LA/raw

Is this command:

dpanon.py \
--in_dir surveys/ \
--out_dir /data/predict/tb571_lochness_test/ANON-PHOENIX/GENERAL/FAKE_LA/processed/surveys \
--datatype survey

intended to anonymize this tree?

surveys/
├── LA06749
│   └── LA06749.FAKE.json
├── LA10782
│   └── LA10782.FAKE.json
├── LA24153
│   └── LA24153.FAKE.json
├── LA27050
│   └── LA27050.FAKE.json
├── LA54257
│   └── LA54257.FAKE.json
└── LA99243
    └── LA99243.FAKE.json

I don't see it doing so.

This is not how I intended for the users to use dpanonymize. --in_dir is for a directory with multiple files of a single datatype, directly under the given directory. We will not likely use this function for U24.

kcho commented 3 years ago

This does not work either:

dpanon.py --phoenix_root /data/predict/tb571_lochness_test/ANON-PHOENIX/ --datatype survey

Thanks for pointing this out. I'll have a look at what is going on with this.

tashrifbillah commented 3 years ago

I am trying to debug this line. Consider the following example:

In [1]: obj= Path(r'C:\\Users\\tb571\\Documents').glob('*txt')

In [1]: obj
Out[1]: <generator object Path.glob at 0x0000023F6138FF48>

How do I examine the contents of obj?

kcho commented 3 years ago

This does not work either:

dpanon.py --phoenix_root /data/predict/tb571_lochness_test/ANON-PHOENIX/ --datatype survey

Thanks for pointing this out. I'll have a look at what is going on with this.

Can you try with --bids option please?

kcho commented 3 years ago

I am trying to debug this line. Consider the following example:

In [1]: obj= Path(r'C:\\Users\\tb571\\Documents').glob('*txt')

In [1]: obj
Out[1]: <generator object Path.glob at 0x0000023F6138FF48>

How do I examine the contents of obj?

I'd use list(obj)

tashrifbillah commented 3 years ago

I'd use list(obj)

I keep forgetting list does not work with python -m pdb. Anyways, I am able to examine now.

I think I found the error. Lochness calls it surveys but here you are calling it survey. Hence, glob never finds the intended directory.

P.S. Can you comment without hitting reply or marking down relevant portion only? Otherwise, the thread keeps getting longer.

kcho commented 3 years ago

I think I found the error. Lochness calls it surveys but here you are calling it survey. Hence, glob never finds the intended directory.

Good point. Are you on it to change it by any chance? If not I'll work on it soon.

kcho commented 3 years ago

Is this https://github.com/AMP-SCZ/dpanonymize/issues/9#issuecomment-877233447 resolved?

tashrifbillah commented 3 years ago

My data are not according to BIDS. Regardless, it should fail because of survey/s spelling. That said, I haven't tried --bids option. I shall have to lochnesss pull according to bids to be able to try that--maybe later.

kcho commented 3 years ago

fixed (survey -> surveys ) https://github.com/AMP-SCZ/dpanonymize/commit/7983ac468f058450dfab7cf2007125cb4e93d2c1

tashrifbillah commented 3 years ago

Both --phoenix_root and --in_dir options work now.

tashrifbillah commented 3 years ago

On the other hand, I am not equipped to test --bids as of now but let's answer the following question to verify that route:

Does list(root_dir.glob(f'*/{module}')) if BIDS else list(root_dir.glob(f'*/*/{module}')) translate to:

PHOENIX/GENERAL/{STUDY}/{SUBJECT}/surveys if BIDS else PHOENIX/GENERAL/{STUDY}/{SUBJECT}/raw/surveys ?

kcho commented 3 years ago

Yes, translate to PHOENIX/GENERAL/{STUDY}/{SUBJECT}/surveys/raw.

https://github.com/AMP-SCZ/Phoenix-BIDS/issues/8#issuecomment-858040422