allinurl / goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
https://goaccess.io
MIT License
18.46k stars 1.11k forks source link

top by device-type #10

Closed abgit closed 10 years ago

abgit commented 11 years ago

This is a counter that will group hits by device type. And, at least we have 3 access types: 1: desktop 2: mobile devices 3: other (eg: crawlers)

Very useful on analytics to apply some strategy.

desktop: X hits ( x%) mobile: Y hits (y%) other: Z hits (z%)

abgit commented 11 years ago

There's already a 'browsers' module that distinguish crawlers from others and 'os' that distinguish operative systems. The problem here is to distinguish 'desktop' from 'mobile' devices.

Maybe this can only be done by prefixing operative systems module with type or create a new module specific for this propose. Examples: 'Windows NT 4.0' would change from 'Windows' to 'Desktop - Windows'; 'iPhone' with change from 'Macintosh' to 'Mobile - Macintosh'; 'iTunes' with change from 'Macintosh' to 'Desktop - Macintosh';

What's the best approach for this?

allinurl commented 11 years ago

Not sure about this. So far it's possible to know which ones are mobile (i.e., Android, iPhone, Blackberry, etc) and which ones are desktop. Perhaps we could add a new sub node under the actual OS, however, feels like it wouldn't serve much purpose?

NinnOgTonic commented 9 years ago

@allinurl I think perhaps it would be a good idea to consider reopening this issue?

We would love to have device type information in the JSON output, or alternatively pure UA strings grouped somehow perhaps? I.e. It would be nice to have Windows NT 6.3; ARM (Or those with touch identifiers perhaps?) type devices not just classified as windows, but rather surface devices and so forth imo?

aphorise commented 9 years ago

Just a thought for the longer term - can we strive to provide a comprehensive / near complete device coverage? - if we'd have a conscience lexical parallel of Device:to:Browser even with manufactor specific / vendor specific pollutants :smile: (ms, vs). I'd approximate that a database with ~10k-18k of known devices out there would be a 97%+ coverage of all thats mobile & maybe in circulation from NetFront, HTML4, Java, MMS, SymbianOS, WAP, WEB, WHL, Windows, WindowsCE, etc...

Using a few known listings & articles (0, 1, 2, 3, 4) as guide the following lexicon may be a good start:

UA keyword device full UA sample string of a specific device
UP. / up.b / up/ Openwave Mobile Browser / telephone AUDIOVOX-CDM-8915 UP.Browser/6.2.2.6.h.1.102 (GUI) MMP/2.0
BlackBerry, BlackBerry, UP., up.b, up/ vs BlackBerry device / telephone BlackBerry6510/4.0.0 UP.Browser/5.0.3.3
HP, hp-tablet or msvs HP device / telephone Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240x320)
HTC, HTC_, HTC- or msvs HTC device / telephone HTC_Touch_HD_T8282 Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile 7.11)
HTC, HTC_, HTC- or msvs HTC device / telephone HTC_S310-Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; Smartphone; 176x220)
LGE, LG-, lg-, LG/ LGE tablet / telephone LG-T300/V100 Obigo/Q7.3 MMS/LG-MMS-V1.1/1.2 MediaPlayer/LGPlayer/1.0 Java/ASVM/1.1 Profile/MIDP-2.1 Configuration/C
LGE, LG-, lg-, LG/ LGE tablet / telephone Mozilla/5.0 (Linux; Android 4.1.2; LG-P760 Build/JZO54K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.99 Mobile Safari/537.36
SAMSUMNG, samsung, sam-, sam or msvs Samsung tablet / telephone SAMSUNG-SGH-A737/UCGI3 SHP/VPP/R5 NetFront/3.4 SMM-MMS/1.2.0 profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0
SAMSUMNG, samsung, sam-, sam or msvs Samsung tablet / telephone samr810 Netfront/3.4 Mozilla/5.0 like Gecko/20060426
sie-, SIE- Siemens telephone SIE-SK6R/46 UP.Browser/7.0.2.2.d.1.100(GUI) MMP/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
SonyEricsson, sonyericsson, Sony Tablet, or msvs SonyEricsson tablet / telephone SonyEricssonK608i/R2BA Browser/SEMC-Browser/4.2 Profile/MIDP-2.0 Configuration/CLDC-1.1
SonyEricsson, sonyericsson, Sony Tablet, or msvs SonyEricsson tablet / telephone SonyEricssonK608i/R301 Profile/MIDP-1.0 Configuration/CLDC-1.0

A complete device focused list of manufacturers & relevant providers in the global context could comprises of approximately ~ 142 with more UA's by distributor / OEM ? :

A-Z of makes & vendors
Acer, Airness, Alcatel, Allview, Amazon, Amoi, Apple, Asus, AT&T, Audiovox
BlackBerry / RIM, Benefon, Benq, Benq-siemens, Bird, BLU, Bosch
Casio, Cat, Celkon, Chea, Coolpad, Cricket, Dell, DoCoMo (NTT Mobile)
EE, Elson, Emporia, Emblaze, Energizer, Ericsson, Eten, Ezio, Ezze
Fly, Foxconn, Fujitsu Siemens, G3, Garmin-asus, Gigabyte, Gionee
Haier, HP, HTC, Huawei, Jolla
I-mate, I-mobile, Icemobile, Innostream, iNQ, Intex
Konka, Kyocera, Karbonn, Lava, Lenovo, LGE
Maxon, Maxwest, Meizu, Micromax, Microsoft, Mitac, Mitsubishi, Mobilexp, Mobistel, Modelabs, Modu, MWg, Motorola, My Way
Nec, Neonode, Nintendo, NIU, Nokia, nook, Nvidia, O2, Obigo, Onda, Orange, OnePlus, Oppo
Palm, Panasonic, Pantech, Parla, Philips, Phoneone, Psion, Plum, Posh, Prestigio, QMobile, Qtek
Sagem, Samart, Samsung, Sanyo, Sec, Sendo, Sewon, Sharp, Siemens, Skyspring, Sonim, Sonim, Sony, Sony Ericsson, Spice, Sprint, SPV
T-mobile, Tel.Me., Techfaith, Thuraya, Toshiba
UCWEB, Uniscope, Uriver, Utec, Utstarcom, Vertu, verykool, Virgin, Vitelcom, Vivo, Vk Mobile, Vodafone, Voxtel
Wellcom, Wiko, WND, XCute, Xiaomi, XOLO, Yezz, Yota, YU, ZTE

This list excludes all potential UA such as those by application, crawler, proxy, service, etc, from the 1.5+ million (& growing) already classified in some directories with others yet to emerge for vehicle / automotive (your car) and television (not sure how unique these are) systems.

I can help in the compilation of the proposed DB.

allinurl commented 9 years ago

@overnine assuming we still want to use the ones posted above, i.e., desktop, mobile, others then I think we could add this as a new panel. Categorizing this could be tricky. Are those the only categories we could use? Should we have tablets as well?

@aphorise having a comprehensive list of browsers and OSs would be awesome, but we may need to refactor browsers.c & opesys.c since currently those are bottlenecks. (the size of the list is proportional to the run time, which makes them slow). We could have a large list mapped in a hash but it would need to be a full match search. Or bsearch perhaps?

For the record, this may be related to #152.

NinnOgTonic commented 9 years ago

@allinurl i only use goaccess for the json output, i dont really have any considerations in terms of usability of the curse interface. Also the most comprehensive resource i know of which handles this is https://github.com/serbanghita/Mobile-Detect

aphorise commented 9 years ago

@allinurl - I actually think we can do this adaptively / build on a global browser & device context. So instead of having fixed size table / bsearch or hash - have a tree that spans / adaptively increases based on new / differing UA's that read and that would only get a naive comparison against a master UA-DB / record once for every-thing unknown or new.

So basically the UA-DB (lexicon) can have two focuses:

The ideal UA lexicon would be ( adaptive-UA) adaptively built from the data-set (logs) being parsed and could be an O(n) algorithm if its stored in ahash or a O(n^2) if its in some other from (btree, link-list); either-way the scope of any set in most cases should be less than < complete-UA O(n) or O(n^2)thats the total space / list of everything thats known & can be targeted.

What'd be even more smart for real-time stats & scenarios where adaptive-UA can / may grow indefinably would be to have a stack-limit to a reasonable size 2^16 or 65535 - thereby dropping any single / one time occurrences; all that would be needed is a FIFO / FILO (first-in-first-out / first-in-last-out) precedence on the lower portion (say a reserve of 255) so as to always accommodate and record new / single 1 time agents that would always register and would be visible in the most recent time-frames. Ditto governing rules for what can move up the adaptive-UA

I hope I'm clearing conveying the intent.

aphorise commented 9 years ago

For reference the RAM / memory space requirements to hold all potential UA can be expressed as:

# for 65535 records
MAX / unlikely case
(2^16)-1=65,535 bytes per record max
65535^2=4,294,836,225 bytes (~``4``Gbytes)
AVG q3 / possible upper-quartile
(2^11)-1=2,047 bytes per record avg-q3
2,047*65535=134,150,145 bytes (~``134``Mbytes)
AVG q1 / possible lower-quartile
(2^9)-1=511 bytes per record avg-q1
511*65535=33,553,920 bytes (~33``Mbytes)
AVG / common sub-or-at-average
(2^8)-1=255 bytes per record real average
255*65535=16,711,425 bytes (~``16``Mbytes)

This is on the assumption that no UA are alike (all unique for 2^16-1) and that none shall ever exceed 64``Kbytes which is unreasonably high & frankly even graver than stupid if neared or exceed. The most obvious candidates of illogical uses which tend to exceed the common 512-1024 (bytes) are for example _Internet-E_xplorer (IE) or other debug or build specific text that may be embedded as part of the UA identifier - even these tend to be < 8``Kbytes at worse.

A few other rules can also be applied as objective determinant for anything not matching which can have a final criteria scope of device &/or browser :

^^^ the same approach can also be applied initial for new / unknown UA so as to further scope the lookup by device-ua first then other-ua otherwise fallback to the first / best initial detection (if applicable).