fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

Request for min/max used for feature and attribute normalization in input data #27

Closed lipikaramaswamy closed 2 years ago

lipikaramaswamy commented 2 years ago

Hello, I see the input datasets you've shared in drive have been normalized. Would you be able to provide the min/max used scaling for each feature and attribute so an inverse transform can be applied to get samples back in the original scale? Thanks!

fjxmlzn commented 2 years ago

Sorry for the delay.

The link you put is Wikipedia web traffic dataset. Following is the information about it.

For features, we first do x -> log(1 + x) transform, and then normalize the data to [-1, 1] range according to the global min=0 and max=18.02413958371087 (i.e., data -> data / max * 2 - 1). You can do the inverse transform accordingly.

For the three attributes (domain, access type, agent), the meaning for the indexes (i.e., 0 ~ 8 for domain, 0 ~ 2 for access type, and 0 ~ 1 for agent) are:

Projects:  ['commons.wikimedia.org', 'de.wikipedia.org', 'en.wikipedia.org', 'es.wikipedia.org', 'fr.wikipedia.org', 'ja.wikipedia.org', 'ru.wikipedia.org', 'www.mediawiki.org', 'zh.wikipedia.org']
Access:  ['all-access', 'desktop', 'mobile-web']
Agents:  ['all-agents', 'spider']
lipikaramaswamy commented 2 years ago

Thank you! Appreciate the detail.