PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Getting the number of unique IP addresses by each day (cumulative) #209

Closed IgorRodchenkov closed 9 years ago

IgorRodchenkov commented 9 years ago

Seems to be able to see a "By Day" view, but not "Cumulative": http://www.pathwaycommons.org/pc2/log/PROVIDER/Reactome/ips Looks, after months of using PC2, it takes way too long to calculate. It's slowly getting tougher to compute as time goes.

It initially (last year) worked if you clicked "Cumulative" button... (still works for access stats http://www.pathwaycommons.org/pc2/log/PROVIDER/Reactome/stats but is a different story, because counts add up, unlike no. unique IP addresses).

I am afraid there is a design flaw (blame me): every time it queries the log db for distinct client IP addresses for a period from start (01/01/2014) to given date, loop for dates until today, given category (e.g., PROVIDER, Reactome).

Uh, that's why our /downloads and /datasources pages became so slow (refs issue #206)

We should probably re-design the system and log db to calculate the no. unique clients daily (e.g. at 12:00 am for the previous day), per category/provider, and store in the db. Or, let's simply remove this "Cumulative" IPs feature from PC2 (just left there the total and daily no. unique IPs per log category)?

IgorRodchenkov commented 9 years ago

Looks, I found a better solution, which does not anymore require pre-calculating and storing in the log db (under special key: date, name, addr='UNIQUE') the cumulative no. unique IP addresses for each event type,name... Previous solution not only did not much improve the page loading time but also made the cpath.h2.db 50% larger right away (and generated lots of unwanted rows with count=0...). So, I am removing LogUtils.UNIQUE_IP constant, "-log --update" console command option, once again modify/fix the log events timeline queries in the LogEntityRepositoryImpl (based on spring-data and querydsl), etc.

Will commit/push new changes shortly (almost done).