Open da2x opened 10 years ago
This is probably one of those things that depends on the type of traffic the site gets. I'm thinking that perhaps adding a panel or a dialog that displays all the unknown user agents would help the user to expand the list, if needed.
Same problem here.
and here
I'll look further into this. Thanks.
@daniel-gomes-sociomantic, @Aeyoun & @cganterh - can you guys kindly provide a sample of target / known User-Agent
's (UA) and or related OS
/ Browser
that you're dealing with - yet are not showing?
It would be great to see UA
string
's that are (humanly) reasonable to assume to be of an OS
&or browser
yet are not being parsed, categorised and understood as expected.
Sorry, I haven't used this software lately.
@aphorise Here is a sample from my unfinished site's access_log and goaccess config to parse it.
Looks to me Baiduspider(Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
) is getting categorized into unknown OS.
@2vek - thank you for the sample. Out of interest does the BlackBerry9000
device / OS show & are all the UA that you've provided (4 variance I think) not showing?
@allinurl - I'd say (related to #10) - if we conclude with or use a single UA-DB - which'd hold all browser
:
device
:
other-ua
for lookups - then the other-ua
portions will predominantly be the service/bot/crawler agents as per whats listed on the public directories.
I think services
, bots
& crawlers
are all fitting appellations subject to the whats matched / recognized. The slight difference between bot
& crawlers
being that the later tend to be Search Engine specific.
@aphorise the UA-DB discussed on #10 sounds like it would be an interesting approach, however, now that I think about it, I'm curious how UA versioning would work...
@allinurl - if I've not misunderstood you - there would be no versioning only a comprehensive and complete listing. So in the case of UA-DB being present only thats used and even the current conditions you have (regex style as per whats in browsers.c
& opesys.c
) when compiled into the same list would be one
(1x
) of X
conditions. The only version difference would be that of the UA-DB
which would naturally have more records / increase into the future. Where there is no UA-DB then the current approach / conditional checks that you have can work fine or a in-memory build of it that compiles to a complete list / directory (hash-table) of all permutations in case of an unfulfilled targeted match (by device
:os
:browser
in UA-DB).
I'm going to see if I can mock something standalone around this & earlier discussed (hashing) ideas.
@aphorise a mock will be great :+1: .Thanks for clarifying this a bit more.
@aphorise all request for BlackBerry9000
shows under "others" section in OS. That log does contains few UA's that are showing up fine. I admit I was not very thorough in filtering.
@allinurl putting baiduspider in others section should be fine. I think OS of a bot should not matter to end user.
Baidu is a search engine bot.
(Not posting URIs because some are spamish.) Comments after #-symbol.
Some unknown user agents by category (1000 hits or more in the last 48 hours):
**Feed readers:**
AppleNewsBot
Feedbin feed-id:<int> - <int> subscribers
Superfeedr bot/2.0 http://superfeedr.com - Make your feeds realtime: get in touch - feed-id:<int>
Mozilla 5.0 (compatible; Feedio.co Feed Crawler/1.0; +<uri>)
Mozilla/5.0 (compatible; OperaDiscoverBot/2015.01; <uri>)
Mozilla/5.0 (compatible; inoreader.com-like FeedFetcher-Google)
alertmix crawler/1.0 (a news crawler; <uri>; <email>)
Mozilla/5.0 (compatible; theoldreader.com; <int> subscribers; feed-id=<hash>)
Digg Feed Fetcher 1.0 (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
FeedBurner/1.0 (<uri>)
**Others (high number of requests):**
Mozilla/5.0 (compatible; spbot/4.4.2; +<uri> )
Go 1.1 package http
Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +<uri>)
Mozilla/5.0 (compatible; Gluten Free Crawler/1.0; +<uri>)
Mozilla/5.0 (compatible; MJ12bot/v1.4.5; <uri>?+)
Google favicon
Mozilla/5.0 (Windows NT 10.0; Trident/7.0; FunWebProducts; yie9; rv:11.0) like Gecko
NerdyBot
Microsoft-WNS/10.0 -- Fetches Live Tiles for Windows 10's Start Menu. Almost an RSS reader? Kind of.
com.apple.Safari.SearchHelper/11601.2.3 CFNetwork/760.1.2 Darwin/15.0.0 (x86_64) # [OpenSearch in Safari](https://www.aeyoun.com/webdev/safari-quick-website-search.html)
Y!J-ASR/0.1 crawler (<uri>)
WinHTTP
Sogou web spider/4.0(+<uri>)
Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +<uri>)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36
ltx71 - (<uri>
Qwantify/1.0
NetLyzer FastProbe (See <uri> for info))
**Browsers:**
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.8.1000 Chrome/30.0.1599.101 Safari/537.36
**Advertising:**
Mediapartners # Google AdSense, 10 000 hits a month
ADmantX Platform Semantic Analyzer Appnexus - ADmantX Inc. - <uri> - <email>
Mozilla/5.0 (compatible; proximic; +<uri>)
Mozilla/5.0 (compatible; GrapeshotCrawler/2.0; +<uri>)
Nutch/2.2.1 (page scorer; <uri>)
Mozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
@Aeyoun Thanks for posting this. Are these not being recognized under the browsers or os panel, or both?
This was about browsers. Maxthon is identified as Chrome. The other end up in "Unknown".
We could make some educated guesses to improve on OS detection, though. Google should get their own OS. Because they don't use anything known and certainly use all custom hardware and software. Here is a User-Agent to OS matching:
Pulling the above out of the unknown OS category should drop my unknowns from 68% to 57%.
IMO google
is not an OS
and should not get a separate category. Unkown
or Unidentified
may be more fitting where no OS
has been declared or included in the UA. There are some good guesses that can be made for typical or common use-case / scenarios involving crawlers
, bots
& services
in general however they'd remain a guess / assumption at best.
Having the same issue:
6 - Operating Systems
Total: 1/1
Hits Vis. % Bandwidth Data
------- ---- ------- ----------- ----
1401742 3459 100.00% 0.0 B Unknown
7 - Browsers
Total: 1/1
Hits Vis. % Bandwidth Data
------- ---- ------- ----------- ----
1401742 3459 100.00% 0.0 B Unknown
@areis422 Do you have the right format?
Since it only recognized 1 browser/os (all unknown) , seems like you may not have the right log format. Please double check that, otherwise feel free to post a few lines from your log and the log format being used.
Standard Apache logs, using (CLF):
time-format %H:%M:%S
date-format %d/%b/%Y
log-format %h %^[%d:%t %^] "%r" %s %b
Switched to NCSA format and I'm getting browsers and O/S now. Sorry for the bother.
It would be great to be able to see the different "Unknown" UAs. I'm currently using GoAccess with an API, so I set my own unique UA in my client app.
@plusCubed #560 will add the ability to load your custom list of browsers. From this comment, I'll probably add someway of displaying some of the most popular UAs from the unknown category.
Strange HUAWEI + Android + Facebook + X
See https://github.com/allinurl/goaccess/issues/997
Just a general bug tracking what all my pull requests have been about.
For my own sites, I am still at 62 % unknown for OS and 42 % unknown for browsers.