TheWeatherChannel / dClass

Device Classification Engine
Apache License 2.0
64 stars 17 forks source link

Documentation #12

Open fillmore50 opened 9 years ago

fillmore50 commented 9 years ago

Hi, is there more documentation on the syntax of dTrees anywhere? the little I see is confusing, particularly when compared to the deviceMap dtree example. Aren't B base type patterns supposed to support other patterns and return no result in themselves?

Also, I don't get how the "1.0" strings can be representative of meaningful patterns. For ex:

"1.0";"MD301H";C;"MD301H";"vendor"="ZTE","model"="MD301H ","parentId"="genericZTE","inputDevices"="-","displayHeight"="220","displayWidth"="176","device_os"="-","ajax_support_javascript"="true","is_tablet"="false","is_wireless_device"="true","is_crawler"="false","is_desktop"="false" "1.0";"Pixi/1.0";C;"Pixi/1.0";"vendor"="Palm","model"="Pixi","parentId"="genericPalm","inputDevices"="touchscreen","displayHeight"="400","displayWidth"="320","device_os"="Palm webOS","ajax_support_javascript"="true","is_tablet"="false","is_wireless_device"="true","is_crawler"="false","is_desktop"="false" "1.0";"TouchPad/1.0";C;"TouchPad/1.0";"vendor"="HP","model"="TouchPad","parentId"="genericHP","inputDevices"="touchscreen","displayHeight"="1024","displayWidth"="768","device_os"="webOS","ajax_support_javascript"="true","is_tablet"="true","is_wireless_device"="true","is_crawler"="false","is_desktop"="false"

According to the README, "1.0" is the pattern" It would make a lot more sense if "Pixi/1.0" or "TouchPad" were the patterns....confused.

Finally, can I build a pattern that contains, for example: "Android 4.1" and "HTC One S" anywhere in the original input? how do I build the pattern in the dTree to match that?

In the test.dtree I see:

del,del=del;delt;S;;var=value1;value2,v2=two

does it mean that I can use comes for the case above? I would imagine that to be a strong pattern, right? thanks

rezan commented 9 years ago

The only documentation is what is provided in this project:

https://github.com/TheWeatherChannel/dClass/blob/master/dtrees/README

I apologize for the cryptic format, its geared for power, not usability or readability... :( Also, dtree (and dClass) was made to conform to existing specs, so it carries a bit of baggage when compared to a system built ground up with no pre existing requirements...

Aren't B base type patterns supposed to support other patterns and return no result in themselves?

Correct. There are subtle complexities when parsing OpenDDR which require base types to share ids with none base types and define key/values. However, I should probably revisit why this is done. So I will followup on this.

Also, I don't get how the "1.0" strings can be representative of meaningful patterns.

All those 1.0 are chain types meaning they depend on other patterns. So for MD301H, the pattern is: "md301h" and "1.0". This is just the way the patterns are defined, there are 2 components. Sure, it could be redefined as "md301h 1.0", but that needs to be done in the DDR. dClass cannot redefine the pattern itself.

Finally, can I build a pattern that contains, for example: "Android 4.1" and "HTC One S" anywhere in the original input? how do I build the pattern in the dTree to match that?

Of course:

#test dtree
#$regex
#!unknown
#pattern      ;id      ;type  ;parent  ;key=values
android 4\.1  ;a41     ;S     ;        ;android=true,version=4.1
htc one s     ;htc1s   ;S     ;        ;phone=true,version=1s 

Note the #$regex and #!unknown directive... so I admit, there is a bit of a learning curve to getting this stuff to work properly and the documentation doesnt make those directives very clear. You could have also used #$partial instead of #$regex and change the \. to just ..

fillmore50 commented 9 years ago

Thank you. This helps. I am still perplexed about the expresiveness of a dTree in relation to common detection use-cases. What if I wanted to represent the same device running different versions of the OS as separate entries? say HTC ONE running OS 4.1, 4.2 and 4.3 respectively? the approach you showed seems to assume that properties of those profiles are neatly disjoint....

as far as openddr goes, it seems to me that the project is pretty much dead and poorly documented anyway....at the very least it would be nice to understand more about how one goes from the openddr files to the dclass dtree...

rezan commented 9 years ago

What if I wanted to represent the same device running different versions of the OS as separate entries

You can either disjoint them or combine them. I would recommend keeping OS detection seperate from device detection. But you can combine them:

"htc one";htc1;B;;
"4\.1";htc1_41;C;htc1;os_version=4.1
"4\.2";htc1_42;C;htc1;os_version=4.2
"4\.3";htc1_43;C;htc1;os_version=4.3

as far as openddr goes, it seems to me that the project is pretty much dead

Correct. dClass has moved over to DeviceMap for its DDR.

http://devicemap.apache.org/

fillmore50 commented 9 years ago

I really meant devicemap when I wrote openddr. The connection between the devicemap schema and dtrees does not seem to be explained anywhere. I am looking at http://wiki.apache.org/devicemap/DataSpec2

I am guessing that there is some connection between the XMLs and the generated dTree, but exactly what relation is left as a (difficult) exercize. I would be happier with simply understanding what is possible with dclass and dtrees.

rezan commented 9 years ago

The connection between the devicemap schema and dtrees does not seem to be explained anywhere

There is a translation layer which loads OpenDDR XML. If you then output it back out to dtree, you get the conversion. If you compile dClass into a standalone binary, you can just use the -d and -o params.

https://github.com/TheWeatherChannel/dClass/blob/master/src/devicemap_client.c

I am looking at http://wiki.apache.org/devicemap/DataSpec2

Thats the DeviceMap 2 data specification. Thats going to be a large departure from the OpenDDR specification. As far as I am aware, the OpenDDR data spec was never documented.

I would be happier with simply understanding what is possible with dclass and dtrees

Any kind of pattern matching is possible. So the documentation in this project and this thread is a good place to get started. So let me know what you are trying to do and I would be glad to help.

fillmore50 commented 9 years ago

let's say I want a string containing the following tokens to return the following KV pairs:

"Android 4.1" AND "SGH-I9000" => key=Val1 "Android 4.2" AND "SGH-I9000" => key=Val2 "Android 4.1" AND "SGH-I9300" => key=Val3

what would the relative dTree look like?

thanks

rezan commented 9 years ago

Something like this:

# dtree for os and device detection
#$regex
#$dups
#@unknown

# base os patterns

#pattern      ;id        ;type  ;parent  ;key=values
android 4\.1  ;a41       ;B     ;        ;
android 4\.2  ;a42       ;B     ;        ;

# device patterns

#pattern      ;id        ;type  ;parent  ;key=values
SGH-I9000     ;sghi941   ;C     ;a41     ;model=SGH-I9000,android=4.1
SGH-I9000     ;sghi942   ;C     ;a42     ;model=SGH-I9000,android=4.2
SGH-I9300     ;sghi9341  ;C     ;a41     ;model=SGH-I9300,android=4.1
fillmore50 commented 9 years ago

Thanks. I am a bit perplexed about the semantics of this (from the test dtree)

test8 test9;test89;C;test7;

Does this mean that both tokens must be in the string? with exactly one space between them? would it be the same as this?

"test8 test9";test89;C;test7;

btw, what's the role of capitalization?

rezan commented 9 years ago

Yes, the pattern to be matched is:

test8 test9

If you are talking about case in pattern matching... all pattern matching is US-ASCII case insensitive.

fillmore50 commented 9 years ago

I see that "_" is being used as a separator to tokenize strings: dtree_client.h#L131

define DTREE_HASHSCHARS " -/()."

Does that mean that I cannot match "8_3" or "8_1_2" as tokens?

fillmore50 commented 9 years ago

Another question. Do Pattern IDs need to be unique? I looked at the devicemap.dtree and there doesn't seem to be a strict requirement for uniqueness. Am I missing something?

 25 "SM-G900"
 24 "SM-N910"
 15 "genericWebBot"
 12 "desktopDevice"
  9 "genericPhone"
  7 "SM-T800"
  7 "SM-T330"
  7 "DROID BIONIC 4G"
  6 "SM-T530"
  6 "SM-T230"
  6 "BlackBerry 9650"
  5 "T-Mobile myTouch 3G"
  5 "SonyEricssonR800at"
  5 "SM-T700"
  5 "P510e"
  5 "HTC Dream"
  5 "HTC_DesireS_S510e"
rezan commented 9 years ago

Does that mean that I cannot match "8_3" or "8_1_2" as tokens?

No, DTREE_HASH_SCHARS are dual purpose. They are both token separators and pattern matchable. So _ is pattern matchable.

If you were to try and use !, that isn't pattern matchable. So it would be replaced with a regex wildcard . and be a more fuzzy match. You could either accept the less than exact matching or add the char to DTREE_HASH_TCHARS and allow for an exact match.

Do Pattern IDs need to be unique?

No. Patterns with matching ids will share a few attributes:

Btw, thanks for the interest and discussion. I actually had to reread the readme to answer several of your questions. The good thing is that everything in this thread is documented in the readmes. However, it is a bit coarse and difficult to grasp. So I actually think asking questions (as you are doing) is best! So thanks again :)

fillmore50 commented 9 years ago

What's the syntax for pattern IDs? any string including one that contains spaces (as long as it is in quotes)?

Also, does semi-colon ( ; ) carry a special meaning? is the following pattern going to behave as expected?

for example, the "NOKIA; Lumia 635" pattern does not seem to match the following:

'Mozilla/5.0 (Mobile; Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 635; Vodafone) like iPhone OS 7_0_3 Mac OS X'

but "NOKIA; Lumia 635" will....

thanks

rezan commented 9 years ago

does semi-colon ( ; ) carry a special meaning

Semi colon is a reserved character along with comma and equal (in the context of the key/value). If you use them, you must use double quotes. Ex:

"NOKIA; Lumia 635";nokia "lumia" id!! 635, 1=1;S;;nokia="true, two="2"",test=false,abc=""123""

The above pattern matches your test string. The id is nokia "lumia" id!! 635, 1=1. The key nokia has a value of true, two="2", test is false, and abc is "123".

fillmore50 commented 9 years ago

Can you elaborate on the ID?

nokia "lumia" id!! 635, 1=1

why the quotes? why the exclamation marks? why the 1=1 ? thanks

rezan commented 9 years ago

No reason, just demonstrating that any character is legal other than what I mentioned above.

joeyhub commented 5 years ago

There's also missing documentation on how to generate the browser dtree.