freme-project / e-Entity

Apache License 2.0
1 stars 1 forks source link

Assign topics to entities #44

Closed m1ci closed 9 years ago

m1ci commented 9 years ago

Enrich each named entity with list of topics. The list of topics will be derived from DBpedia which refer to dcterms:subject information. Output example:

<http://freme-project.eu/#char=0,3>
    a                     nif:Word , nif:String , nif:Phrase , nif:RFC5147String ;
    nif:anchorOf          "W3C"^^xsd:string ;
    nif:beginIndex        "0"^^xsd:int ;
    nif:endIndex          "3"^^xsd:int ;
    nif:referenceContext  <http://freme-project.eu/#char=0,33> ;
    itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Organization> ;
    itsrdf:taIdentRef     <http://dbpedia.org/resource/World_Wide_Web_Consortium> .

<http://dbpedia.org/resource/World_Wide_Web_Consortium> dcterms:subject dbc:Consortia ,
    dbc:Web_development ,
    dbc:Organizations_established_in_1994 ,
    dbc:World_Wide_Web_Consortium ,
    dbc:Standards_organizations ,
    dbc:Web_services ,
    dbc:International_nongovernmental_organizations .

This includes two actions:

1) process Wripl data and hand over the results back to Wripl for validation and feedback 1.1) provide the data in TSV 1.2) the data should contain only one record for each entity (remove duplicates) 2) incorporate the feedback and implement this as feature of e-Entity

nilesh-c commented 9 years ago

@m1ci regarding 2) - Since we will be adding scoring and it'll be part of the NIF output, I guess it makes sense to have the addition of topics on the freme-ner side. But using SPARQL (when we integrate the feature into the API) will be slow and add complexity. We can use something really simple like a map of categories here, or do a fast key-value read from our sqlite3 database.

koidl commented 9 years ago

We will need the returned topics also as plane text label. In the example below we have the underscores... lets me know how best to do this? [...]

http://dbpedia.org/resource/World_Wide_Web_Consortium dcterms:subject dbc:Consortia , dbc:Web_development , dbc:Organizations_established_in_1994 ,

[...]

m1ci commented 9 years ago

@koidl thanks for this reminder, yes we will include the topic URI and its label. E.g.,

dbc:World_Wide_Web_Consortium dcterms:subject dbc:Consortia
dbc:Consortia rdfs:label "Consortia"
m1ci commented 9 years ago

excerpts from email conversion:

Mail from Milan, Sep 8th 2015

Hi Kevin,
I am sending attached the results from processing of few documents.
Can you please have a look whether the output containing enrichments is OK - enough readable for your and your customers? If it is OK, I'm going to process all the data you sent to us.

The output is same as we agreed in Turin - doc_id and doc_text followed by the entity mentions in that document - for each entity we provide:
* surface form, entity link, entity type, list entity topics.
In the output are included the entities are included only once per document.

Response from Kevin, Sep 8th 2015

Hey Milan

Thanks for this - looks great but not sure if I get it yet.... 

First line is the text:
"This research deliverable presents an overview of the U.S. heart health ingredients market. The investment highlights depicts the market definitions and the key takeaways. In the overview of the U.S. heart health ingredients, [...]"

This is followed by four values in each row:

A)United States
B)http://dbpedia.org/resource/United_States
C)Location
D)"1776 establishments in the United States,United States,Former British colonies,English-speaking countries and territories,Member states of the United Nations,Member states of NATO,G20 nations,Superpowers,States and territories established in 1776,Liberal democracies,Former confederations,Federal constitutional republics,G7 nations,Republics,G8 nations"

So 

A = Entity Label (surface form)
B = Entity Link
C = Entity Type
D = A List of Topics for the Entity Found (In this case "United Sates")

Thats a lot of labels and topics now.... its blowing up more instead of narrowing but that is fine for now.     Just need to make sure I got it right?

The strategy to narrow this list would be:

1. Provide a list of 'customer specific labels (aka domain specific taxonomy)' and simply check if the customer labels match with either a surface form or topic. The ones that dont match end up in a list of unmatched labels. This may result in being far too narrow however therefor then 2)

2. Extend the customer/domain specific taxonomy with wikipedia links for each label so that we can train the allocation of suface forms and topics even if they don't match the label 100% as they would have to in 1)

Let me know if this makes sense

Response from Milan on Kevin's email, Sep 8th 2015:

​Yes, you got it right.​ 

= domain specific taxonomy​
We will need to perform mapping of the domain specific taxonomy to DBpedia Wikipedia categories/topics. Lets first focus on filtering out "non-important" topics.

= narrowing down the list of topics
We can simple compute the "informativeness" of each topic. The assumption is that more frequent topics are less informative than the rare topics. Then, we can use these information to filter out non-informative topics associated with entities in your documents. We can compute this "informativeness" values directly from DBpedia.

What do you think?

... Kevin, the categories are also structured in a form of taxonomy. The taxonomy is also part of DBpedia and we can re-use it. See the "skos:broader" info for the Consortia topic.

Response from Kevin, Sep 8th 2015:

= domain specific taxonomy​ We will need to perform mapping of the domain specific taxonomy to DBpedia Wikipedia categories/topics. Lets first focus on filtering out "non-important" topics.

Agreed.

= narrowing down the list of topics ​We can simple compute the "informativeness" of each topic. The assumption is that more frequent topics are less informative than the rare topics. Then, we can use these information to filter out non-informative topics associated with entities in your documents. We can compute this "informativeness" values directly from DBpedia.

I would need to see examples. Yes we really need to narrow. This will be nice for the general purpose play which we decided on solving first. I am a little bit worried though that we cant narrow the entities which seem to be to many too. But lets take it step by step maybe via the categories we can narrow the entities? Again the openCalais Social Tags (which are based on Wikipedia categories apparently) somehow manages to do this really nice (not more then 3-5 terms which mostly make sense). Looking forward to see if our strategy will work.

For the domain specific the narrowing (I think but could be wrong) will need smart mapping (not only strong matching but actual learning) due to many categories being to broad or maybe not even existing.

What do you think?

Exciting stuff - lets do it!

... the conversation will continue in this thread.

jnehring commented 9 years ago

I have an idea on how to narrow down the list of topics. Actually we are not interested in categories of entities, but topics of the whole text. So we aggregate the list of entity categories over the whole document and create a table like this: (it contains sample data from an imaginative document about a finance company from New York)

Category Number of occurences Percentage of occurences - number of occurences divided by total number of categories
Finance 20 46.6
Business in New York 17 39.5
Companies established in 1961 3 7
English-speaking countries and territories 2 4.7
Mercury Prize-winning albums 1 2.3

Now we can filter for only the most important categories, e.g. for categories that occur in more then 20% of the entities. Or we just take the top 3 topics. In that way we extract topics for the whole document and filter out some noise.

koidl commented 9 years ago

Looks nice

Whats the status though - just need to know how, when, where :-)

koidl commented 9 years ago

Are we starting a separate thread for the 'domain specific' challenge?

Looks like 'http://www.wandinc.com/' is going to give wripl a 90 days evaluation license which would allow us to pull the labels and add wikipages (and some more if useful) to each label for training.

What do you think? Will that help?

m1ci commented 9 years ago

@jnehring: thanks for the idea. We were actually thinking of such ranking of the categories.

@koidl:

Whats the status though - just need to know how, when, where :-)

We have already created counts datasets for the DBpedia categories. We computed the number of entities assigned with the DBpedia categories. Next, we will do the following:

BTW, I'm not sure whether we should do the filtering of categories on the e-Entity side or this processing should be done at the consumer side (Wripl or any other). Please keep in mind that e-Entity is an entity recognition service. E-Entity does entity recognition, possibly ranking of the entities and entity related information. As far as the developments are directly considering the process of entity recognition (spotting, linking, classification, entity ranking), we can provide support. Additional processing on top of this data, also how this data is aggregated and consumed is already out of the scope of e-entity.

m1ci commented 9 years ago

Are we starting a separate thread for the 'domain specific' challenge?

Yes, I've created one https://github.com/freme-project/e-Entity/issues/46

Lets keep this issue only for discussion on the assignment of topics to entities in general.

There is one more issue on "scoring mechanism for topics" https://github.com/freme-project/e-Entity/issues/45

m1ci commented 9 years ago

@koidl can you please have a look at the output bellow whether it is OK? After the section including the list of entities I added section of top-10 most informative categories. The list of all assigned categories was ranked according to their informativeness - less probable resources are considered to be more specific, and consequently more informative than the more common ones.

1 "This study covers the state of the North American positive displacement pumps market, examining drivers and restraints for growth, pricing, distribution, technology, demand, and end-user trends. Market growth for regional and market segments is forecasted. In addition, an in-depth analysis of the competitive situation including market participant�s market shares is performed. The base year is 2012 with forecasts running through 2019. The market is further divided into three sub-segments including rotary positive displacement pumps, reciprocating positive displacement pumps, and peristaltic positive displacement pumps. A detailed analysis of each of the sub-segments is included.Key Questions This Study Will Answer- Is the market growing? - How long will it continue to grow and at what rate? - Are the existing competitors structured correctly to meet customer needs? - Is this an industry or a market? - Will these companies/products/services continue to exist, or will they be acquired by other companies? - Will the products/services become features in other markets? - How will the structure of the market change over time? Is it ripe for acquisitions? - Are the products/services offered today meeting customer needs, or is additional development needed?" North American http://dbpedia.org/resource/North_America N/A type World Digital Library related,Regions of the Americas,Continents,North America Will http://dbpedia.org/resource/Will_and_testament Person Wills and trusts,Inheritance,Death customs,Common law Questions This Study Will Answer N/A link N/A type N/A categories North America 17.82127572776396 Regions of the Americas 15.94680660984782 Continents 15.406238228485117 Inheritance 12.535873508901712 Wills and trusts 12.266686876086323 Common law 11.971610000848393 Death customs 11.624878514960459 World Digital Library related 8.236939532692272

koidl commented 9 years ago

Milan looks good for this example. Did you test it with something more specific? For example the Wikipedia page of Michael Jackson or the about page of Trinity College Dublin. I just need to see if we are now running the risk of being too high level.

m1ci commented 9 years ago

no, I didn't. let me try. and I will send the post the results.

m1ci commented 9 years ago

for the first 4 paragraphs from https://en.wikipedia.org/wiki/Michael_Jackson I get following top-10 topics with highest informativeness scores

 Marshals 15.94680660984782  Music videos directed by Bob Giraldi 15.94680660984782  Humanities occupations 15.94680660984782  Recipients of Thiri Thudhamma Thingaha 15.94680660984782  Michael Jackson concert tours 15.821275727763961  People acquitted of sex crimes 15.821275727763961  Grand Collars of the Order of Saint James of the Sword 15.705798510344026  Grand Cordons of the Order of Independence (Tunisia) 15.705798510344026  African-American male dancers 15.598883306427513  Magazines established in 1917 15.598883306427513

koidl commented 9 years ago

Its much better however there are a good few things in there that seem strange e.g. 'Magazines established in 1917' or even 'Marshals'

Here the list that comes in from OpenCalais for the same content:

screen shot 2015-09-09 at 14 45 10

It seems immediately more accurate and more useful. We dont need to be as good as OpenCalais (although I always silently hoped we would be better) however if 'Recipients of Thiri Thudhamma Thingaha' (https://en.wikipedia.org/wiki/Thiri_Thudhamma_Thingaha) comes up in relation to Michael Jackson I might have a small problem.

Also, I am fully aware that the OpenCalais terms are not entities.

However the entities in OpenCalais look a lot more spot on too:

screen shot 2015-09-09 at 14 49 54

What do you think? Should we test this with some more content?

koidl commented 9 years ago

Here the link to where the opencalais images are coming from:

http://viewer.opencalais.com/

m1ci commented 9 years ago

Its much better however there are a good few things in there that seem strange e.g. 'Magazines established in 1917' or

Magazines established in 1917 is considered since in the text was spotted the entity Forbes and it has assigned this category

even 'Marshals'

This category occurs since in the text was spotted (incorrectly) the entity Tito (http://dbpedia.org/resource/Josip_Broz_Tito) which has this Marshals category assigned.

however if 'Recipients of Thiri Thudhamma Thingaha' (https://en.wikipedia.org/wiki/Thiri_Thudhamma_Thingaha) comes up in relation to Michael Jackson I might have a small problem.

'Recipients of Thiri Thudhamma Thingaha' is topic assigned to the spotted entity Tito

However the entities in OpenCalais look a lot more spot on too:

Here are the FREME NER spotted entities, just for comparison:

"Artist http://dbpedia.org/resource/Artist Organization Jackson 5 http://dbpedia.org/resource/Jackson_5 Organization Conrad Murray http://dbpedia.org/resource/Conrad_Murray Person Bad http://dbpedia.org/resource/Bad_(album) N/A type Jackson http://dbpedia.org/resource/Jackson%252C_Mississippi Person Grammy Lifetime Achievement Award http://dbpedia.org/resource/Grammy_Lifetime_Achievement_Award N/A type "Thriller" http://dbpedia.org/resource/Thriller_(genre) Person "Scream http://dbpedia.org/resource/Scream_(1996_film) Organization Billboard Hot 100 http://dbpedia.org/resource/Billboard_Hot_100 Organization Grammy Awards http://dbpedia.org/resource/Grammy_Award N/A type Michael Jackson http://dbpedia.org/resource/Michael_Jackson Person Thriller http://dbpedia.org/resource/Thriller_(genre) N/A type "Artist of the Century N/A link N/A type Jackie http://dbpedia.org/resource/Jackie_Jackson Person Forbes http://dbpedia.org/resource/Forbes Organization Joseph Jackson http://dbpedia.org/resource/Joe_Jackson_(manager) Person Jermaine http://dbpedia.org/resource/Jermaine_Jackson Location Los Angeles County Coroner http://dbpedia.org/resource/Los_Angeles_County_Department_of_Medical_Examiner-Coroner Location American http://dbpedia.org/resource/United_States N/A type "Black http://dbpedia.org/resource/Race_and_ethnicity_in_the_United_States_Census Person Songwriters Hall of Fame http://dbpedia.org/resource/Songwriters_Hall_of_Fame Organization Billie Jean http://dbpedia.org/resource/Billie_Jean Person Dangerous http://dbpedia.org/resource/Dangerous_(Michael_Jackson_album) N/A type Grammy Legend Award http://dbpedia.org/resource/Grammy_Legend_Award N/A type Off the Wall http://dbpedia.org/resource/Off_the_Wall_(album) N/A type American Music Awards http://dbpedia.org/resource/American_Music_Award N/A type White http://dbpedia.org/resource/Race_and_ethnicity_in_the_United_States_Census Person Marlon http://dbpedia.org/resource/Marlon_Dingle Person HIStory http://dbpedia.org/resource/History N/A type This Is It http://dbpedia.org/resource/This_Is_It_(concerts) N/A type MTV http://dbpedia.org/resource/MTV Organization "Love Never Felt So Good" N/A link N/A type Jackson family http://dbpedia.org/resource/Jackson_family Person Hot 100 http://dbpedia.org/resource/Billboard_Hot_100 Organization Dance Hall of Fame http://dbpedia.org/resource/Tap_Dance_Hall_of_Fame Organization Tito http://dbpedia.org/resource/Josip_Broz_Tito Person

Yes, there are mistakes, but also there are non-sense in the results of OpenCalais. Examples of wrong spotted entities: promotional tool, artist, dancer, first artist, etc.

What do you think? Should we test this with some more content?

We can, just let us know. Should I process the documents from the chemical domain?

Note that I sent only the top-10 most informative topics assigned to the entities occurring in the document. The list is much longer. We can include, 20, 30...

Currently, all topics are collected, ranked and top-10 is returned. We might try with entity types instead of topics. E.g. types from the DBpedia ontology, Wikidata, YAGO, UMBEL, schema.org.... or combination of all of them. DBpedia onology has 735 entity types, while YAGO, for example, over 350K types. FYI, there are over 900K DBpedia topics (that we are using now).

Lets see what are your thoughts on this.

koidl commented 9 years ago

Interesting - thanks for this. Helps me to understand this more. Some mistakes are fine. Absolutely fine.

Few questions/comments (feel free to respond inline):

1) FREME NER results. Are those all or just a subset above? 2) In your example are you using FREME NER or dbpedia spotlight? 3) The topic above with the percentage. What does that value mean e.g. 15.6? 4) Please try it with the chemical data so we can see how it works (even with a small subset of it) 5) Tell me more about the types you mention. Can we test and compare or is it too much work to investigate. 6) possible related to 5 - For the chemical domain data for example the relationship to the CAS list is useful. For example does the content relate to https://en.wikipedia.org/wiki/List_of_CAS_numbers_by_chemical_compound#B. I am not sure though if these are topics or entities or just a list of labels/links. This might also bring us back to the domain specific which is not the topic here at the moment I guess.

If okay with you lets keep it moving. My feeling is that we need to test different approaches with different content in a systematic way. For example 10 pages of people, 10 pages of organisations, 10 pages of 'general' topic and see how it works with the approach above compared with using entity types.

When using entity types would we get a score back too?

Hope above makes sense?

m1ci commented 9 years ago

1) FREME NER results. Are those all or just a subset above?

all

2) In your example are you using FREME NER or dbpedia spotlight?

sure FREME NER.

3) The topic above with the percentage. What does that value mean e.g. 15.6?

It says how i informative the category is. These values are computed based on the information how many entities have this category assigned. Category with less entities are more informative then those with less assigned entities. See all the categories and their scores here http://rv2622.1blu.de/datasets/dbpedia-categories/dbpedia-categories-counts.ttl

4) Please try it with the chemical data so we can see how it works (even with a small subset of it)

OK, will process the chemical data.

5) Tell me more about the types you mention.

Entity types are attached using rdf:type. See for all the types assigned to http://dbpedia.org/page/Berlin - search for rdf:type.

Can we test and compare or is it too much work to investigate.

Yes we can. Let me provide an example.

6) possible related to 5 - For the chemical domain data for example the relationship to the CAS list is useful. For example does the content relate to https://en.wikipedia.org/wiki/List_of_CAS_numbers_by_chemical_compound#B.

Hm... lets see what goes out from the FREME NER (which entities) - which entities are spotted. Then, 1) if we know the domain (we know - its chemical) and 2) we know the list of relevant entities (or topics) for this domain - (we might generate such list of relevant entities/topics), we might in a post-processing stage filter out only entities from this domain relevant list of topics.

I am not sure though if these are topics or entities or just a list of labels/links.

I don't know how well will chemical compounds will be recognized as entities. Let me process the data.

jnehring commented 9 years ago

I think sometimes it might be hard to decide if you want high or low informativeness. I am thinking about the categories Sports with a low informativeness and Sports in St. Louis / Missouri with a high informativeness. In a general purpose topic extraction system, one would want a text to be labeled with topic Sport. In the sports domain we might be more interested in Sports in St. Louis / Missouri.

koidl commented 9 years ago

@m1ci sounds good - Like the idea with the post filter by the way. It might not be smart enough though. We need to consider Fuzzy Matching for example (in finance domain in this case) - 'ETF' and 'Exchange Trade Fund' are the same and need to relate to the same label. But again I guess thats the domains specific challenge in which we add links to the taxonomy so that there is learning?

Let me know if I am confusing things ... also no problem if you want to set up a prio/task list for this to make sure we all stay on the same page

koidl commented 9 years ago

@jnehring yes thats the idea. However it would be nice for FREME NER to have a slider that allows the level of informativeness to be adjusted. Then the end user can decide if just sport, just Sports in Louis / Missouri or both

m1ci commented 9 years ago

I think sometimes it might be hard to decide if you want high or low informativeness. I am thinking about the categories Sports with a low informativeness and Sports in St. Louis / Missouri with a high informativeness. In a general purpose topic extraction system, one would want a text to be labeled with topic Sport. In the sports domain we might be more interested in Sports in St. Louis / Missouri.

Hm... makes sense, we might reverse the ranking - and include the top-10 non-informative categories :) However, more informative categories better describe the content of the document compared to less informative categories.

We need to consider Fuzzy Matching for example (in finance domain in this case) - 'ETF' and 'Exchange Trade Fund' are the same and need to relate to the same label.

This is task of entity spotting and linking.

But again I guess thats the domains specific challenge in which we add links to the taxonomy

Yes

so that there is learning?

Learning? if we manage to map taxonomy to types/categories then no learning is needed.

koidl commented 9 years ago

@m1ci Sounds good. Ill let you do some testing. Ping me if you need anything. By the way I am talking to Andi over email at the moment to see how e-terminology can help too. I dont want to confuse things though therefore I wont pull this together just yet

koidl commented 9 years ago

Just checking in- whats the status and do we (wripl) need/can do anything to help?

m1ci commented 9 years ago

processing the data from the chemical domain, will hand them over by the afternoon.

koidl commented 9 years ago

The data looks a lot better now - thanks for this

These are wikipedia categories right?

What next?

m1ci commented 9 years ago

These are wikipedia categories right?

Yes.

What next?

Please check if the Wikipedia categories are OK for your "General Purpose Topic extraction" use case. If yes, then next week we integrate this as part of e-Entity: attach categories to the entities and the corresponding "informativeness" values. Sorting and filtering the top-K categories will be then on wripl side.

koidl commented 9 years ago

Sounds great - I'll get back to you early next week

thanks

koidl commented 9 years ago

Hi

We get some really nice ones such as:

Id: 11 Barnidipine Hcl- Barnidipine Hcl Market Research Report 2011

But then we get some that are mostly off (not all though): Id: 12 Songs about The Troubles (Northern Ireland) Music videos directed by Anton Corbijn Phosphates Iron compounds Song recordings produced by Flood (producer) Songs written by Adam Clayton Songs written by Larry Mullen, Jr. Songs written by The Edge Songs written by Bono U2 songs

Why is for example: Songs about The Troubles (Northern Ireland) coming up with a high confidence? Just that I understand more how it works.

However I suggest we use this and move it to dev on the API. We are getting some problems here with some Entities in general. For example 'NOT' which comes up a lot and makes little sense in the analytics. My hope is that by using the categories the labels in the dashboard analytics will make more sense too.

Let me know what you think. We can also test better once its in the API and we see what the data looks like over all active websites wripl is serving.

In relation to the domain specific I would assume that issues such as 'Songs about The Troubles (Northern Ireland)' would then go away due to FREME knowing that the content is in the domain 'Chemical' which has nothing to do with 'Politic' or 'Entertainment'?

When you are ready I will continue the conversation on the domain specific (especially how to use data such as the CAS list or custom taxonomies).

Good work by the way with the categories! Thanks

koidl commented 9 years ago

@m1ci just wondering what the status is regarding my last post. a) some small miss-spottings and b) when will this be the categories be avaliable via the API and what will the return data structure look like?

Thanks kevin

m1ci commented 9 years ago

Why is for example: Songs about The Troubles (Northern Ireland) coming up with a high confidence? Just that I understand more how it works.

I this I already explained this how we do the scoring of the topics: 1) we collect all topics which are associated in we the entities occurring in the document, 2) we sort them and 3) return the top-10 most informative. Topics which are assigned to less entities in DBpedia, are considered as more informative. Those which are assigned to more entities, are considered to be less informative. To compute the informativeness values we use formulas from the information theory. See https://en.wikipedia.org/wiki/Self-information

The dataset with the topics counts and topics informativeness can be downloaded from here http://rv2622.1blu.de/datasets/dbpedia-categories/dbpedia-categories-counts.ttl

b) when will this be the categories be avaliable via the API and what will the return data structure look like?

From today we start with integrating the topics in e-Entity. Will keep you updated. The output you will receive will look like this:

<http://freme-project.eu/#char=0,3>
    a                     nif:Word , nif:String , nif:Phrase , nif:RFC5147String ;
    nif:anchorOf          "W3C"^^xsd:string ;
    nif:beginIndex        "0"^^xsd:int ;
    nif:endIndex          "3"^^xsd:int ;
    nif:referenceContext  <http://freme-project.eu/#char=0,33> ;
    itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Organization> ;
    itsrdf:taIdentRef     <http://dbpedia.org/resource/World_Wide_Web_Consortium> .

<http://dbpedia.org/resource/World_Wide_Web_Consortium> dcterms:subject dbc:Consortia ,
    dbc:Web_development ,
    dbc:Organizations_established_in_1994 ,
    dbc:World_Wide_Web_Consortium ,
    dbc:Standards_organizations ,
    dbc:Web_services ,
    dbc:International_nongovernmental_organizations .

<http://dbpedia.org/resource/Category:Consortia> rdfs:label "Consortia" ,
    fr:info     "15.598883306427513"^^<http://www.w3.org/2001/XMLSchema#double> .

On the client wripl side you'll need to 1) collect the topics, 2) sort them according to the informativeness values, and 3) pick the top-N - you can alone choose N. OK?

a) some small miss-spottings

If you refer to

For example 'NOT'

Its hard to investigate these problems without the source text. I'm sure, this is because you are sending "ugly" text for processing by FREME NER. Also, I have feeling that these strings are not part of a regular sentences. However, hard to say without the source text.

koidl commented 9 years ago

@m1ci thanks for this.

Sounds all good from here. We will investigate the data issue again, however (not to annoy you) we never had any of these issues with OpenCalais therefore we have to investigate deeper. Also in relation to the spotting of 'NOT', for examples, I will dig out some examples to keep us moving.

The new dashboard is received very well here at the conference therefore all good so far.

Not so put pressure on but if you (even tentative) have a timeline for the API integration I can start allocating resources accordingly.

koidl commented 9 years ago

@m1ci hey milan quick question. Do we get a score for each category? Above you only have one or am I getting it wrong?

thanks k.

m1ci commented 9 years ago

@m1ci hey milan quick question. Do we get a score for each category? Above you only have one or am I getting it wrong?

Yes, there will be more scores for each category. The above is just an example - one entity with one category. In reality you will receive more entities with associated categories with different scores attached.

We will investigate the data issue again, however (not to annoy you) we never had any of these issues with OpenCalais therefore we have to investigate deeper.

Maybe they did post-processing and remove entities on a "black list".

Also in relation to the spotting of 'NOT', for examples, I will dig out some examples to keep us moving.

Yes, concrete examples are more than welcome.

Not so put pressure on but if you (even tentative) have a timeline for the API integration I can start allocating resources accordingly.

Hopefully by this Friday.

koidl commented 9 years ago

@m1ci Great looking forward to it. Thanks

johnmcauley commented 9 years ago

I can provide examples of the not issue, will do this evening.

On 16 Sep 2015, at 14:50, Kevin Koidl notifications@github.com wrote:

@m1ci Great looking forward to it. Thanks

— Reply to this email directly or view it on GitHub.

m1ci commented 9 years ago

I can provide examples of the not issue, will do this evening.

OK, please do.

johnmcauley commented 9 years ago

Hi guys,

Attached is a list of problem texts that return - http://dbpedia.org/resource/Not. Each row contains the anchor and entity, followed by the text.

If you look at the texts, each contains the following sentence - This is a FREE report from Insider Monkey. Credit Card is NOT required. - Which appears to return the entities Credit Card, Not and Free software. I have tried several of this text with the Freme API and get those entities each time.

I know the text is a bit spammy but this a big problem for us.

Thanks for all your help,

j

On 16 September 2015 at 17:05, Milan Dojčinovski notifications@github.com wrote:

I can provide examples of the not issue, will do this evening.

OK, please do.

— Reply to this email directly or view it on GitHub https://github.com/freme-project/e-Entity/issues/44#issuecomment-140789140 .

John McAuley

[{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By The Motley Fool in News Published: July 3, 2013 at 9:23 am The housing market is definitely on the mend. Depending on how you want to slice the cattle, you can make a lot of money. I think though that not every investor is interested?in or?willing to take on an inordinate amount of risk. Because of this, I am going to lay out some key macroeconomic indicators, and get to the meat of the argument?as to?whether or not investors should even have a position in housing stocks. The economics, can?t ignore these Source: Ycharts The trend is your friend and the housing market is picking back up again. In certain areas of the United States, the amount of money spent on a mortgage is cheaper than the price of rent. Assuming that?the number of people employed increases and the economy continues to recover, the housing recovery should be well on its way. Source: Federal Reserve Going forward, the real?gross domestic product?is projected to grow at a 2.3% to 2.5% rate. If that is the case, investors should position themselves in housing because?the housing stocks would appreciate rapidly in a cyclical economic rebound. KB Home just announced earnings KB Home (NYSE:KBH) reported a fairly strong quarter. The company was able to increase its revenues by 73% year-over-year (this is a significant improvement; I?wonder where all the bears went on this one.) The company is continuing to recover. The company?s deliveries were up by 39% year-over-year, and the average selling price of homes?grew by around 25% year-over-year. The company?s property backlog is up by an additional 19%. Perhaps back logs are up because investors are fearful of missing out on the next leg-up in the property market. The company reported a loss of $0.04 per share versus?a year-ago period loss of $0.31 per share.?It also reported net income growth. Because revenues were up by $221 million, costs were comparatively up by $197 million. The net difference between the two was what contributed to the company?s net income. Analysts on a consensus basis were anticipating the company to report a $0.06 loss for the quarter, but KB Home (NYSE:KBH) beat analyst estimates by $0.02. Investable insights & another alternative Investors should consider buying a home. Ignore bonds and enjoy the safety of an appreciating real estate portfolio. Now I?m not saying that a home should be your only investment; I am saying that homebuilders are selling homes for ever higher prices. You want to buy on an up-trend.?The trend is your friend, after all, and it is obviously the time to own a bit of the American dream. If?owning a home is a little bit risky, however, why not consider?The Home Depot, Inc. (NYSE:HD)? The company is exposed to the housing sector through home improvement sales. After someone buys a home there?s usually a lot to fix, a lot to upgrade, and a lot to buy. Everything from gardening improvement, paint changes, pipe fixes, toilet replacement, and counter top changes can all be done at The Home Depot, Inc. (NYSE:HD). The company?s stock currently trades at a bit of a hefty valuation (with a 20.5 forward earnings multiple.)?In 2012, the company?was able to grow its earnings per share by 21.5%. The growth in earnings was driven by operating profit margins improving by 93 basis points to 10.39%. The company also repurchased $4 billion in shares, which also contributed to earnings-per-share growth. Analysts are pretty optimistic?about the company?s future. Disciplined cost management, paired with stronger macroeconomic indicators and share buybacks, will grow the company?s earnings going forward. The company?s?stock is projected to grow its earnings by 14.61% per year over the next five years. Why?not own the bank? I think that?Bank of America Corp (NYSE:BAC) could be the most well-positioned bank in terms of earnings growth (I?ll have a separate article dedicated towards the financial sector soon.) The company has a large portfolio of higher-risk securities because in all likelihood, higher-rated (safe) securities are being dumped in favor of riskier assets. The risk premium on BBB-rated bonds is 1.57 currently, which is below the long-run average of 1.867. Assuming that Bank of America Corp (NYSE:BAC) accumulated its BBB mortgages and bonds when risk premiums were above the long-run average, you can basically assume that the company is better positioned than other banks. Source: Bank of America Around 57% of the bank?s assets are below a BBB rating, which implies that the bank is less exposed to coupon note depreciation. It is assumed that the interest rates from the lower-rated securities could make up for the bank?s mark-to-market accounting losses from depreciating AAA-rated securities. After all, treasury bonds are AAA-rated assets and those are declining in value right now. What a bank should own are lower-rated securities that pay a higher rate of interest. Those higher rates of interest would make up for the depreciation on higher-quality debt. Fortunately, Bank of America has positioned itself for this already. The CEO, Brian Moynihan, also plans to cut back on spending by $8 billion by the year 2015. This is why analysts on a consensus basis anticipate?that the?company?will?grow its earnings by 23.39% per year over the next five years. The stock has 41.3 earnings ratio right now, which is reasonable when considering the projected rates of growth. Conclusion Investors need exposure to housing in their investment portfolio. Owning an actual house could be the most lucrative choice right now, but there are other options as well. The home ownership population has declined and the total number of households have gone up, so there?s a lot of pent-up demand which can be reflected in the backlog figures presented by KB Home (NYSE:KBH). Using that, as a leading indicator, we can also assume that demand for mortgages and home improvement will be up as well. Therefore, investors should consider a position in?companies such as KB Home (NYSE:KBH), The Home Depot, Inc. (NYSE:HD), and Bank of America Corp (NYSE:BAC). The article The Housing Recovery Is Offering Lucrative Investment Opportunities originally appeared on Fool.com and is written by Alexander Cho. Alexander Cho has no position in any stocks mentioned. The Motley Fool recommends Bank of America and Home Depot. The Motley Fool owns shares of Bank of America. Alexander is a member of The Motley Fool Blog Network ? entries represent the personal opinion of the blogger and are not formally edited. Copyright ? 1995 ? 2013 The Motley Fool, LLC. All rights reserved. The Motley Fool has a disclosure policy . Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By Javier Hasse in Commodities , News Published: June 9, 2014 at 11:16 am On Friday June 6, Andrew Brown, CIO at Emerging Capital Partners, was interviewed at CNBC and talked about investment opportunities with great potential in the African continent. Mr. Brown highlights that his private equity firm is a Pan-African investor, which implies that it endows businesses in the entire continent, not only in South Africa, as many assume. In fact, he states, opportunities in South Africa are less interesting that those present in the rest of the continent. In terms of where the opportunities are in the continent, most people talk about Nigeria, ?because of the young demographic and rapidly growing population? (CNBC interviewer). Mr. Brown further explains that everybody tends to focus on Nigeria because ?it?s a single country with a lot of people.? However, Emerging Capital Partners looks beyond this, and seeks to reach the same population size, delivering products and services, by endowing companies with presence in several smaller countries. He continues, ?The dynamic you?re seeing in Nigeria is a dynamic that?s playing out across Africa. The challenge is how you actually build businesses that can operate and address that market need.? When considering investing in Africa, one must take into account that, as a continent, it is growing at 5% per year, and this growth rate is accelerating. Actually, this recently resulted in the Work Bank upgrading its forecast to 6% for the continent. ? But what about the risk? Well, Mr. Brown?s job as a fund manager is to manage that risk in order to get stable returns. ?I can?t tell you there is no country or political risk across Africa, but there are certainly lots of businesses that aren?t really impacted by political risk per se. And then, when we invest, we like to build platform companies that are operating across a number of countries (?) and that provides a diversification not only at the portfolio company level, but then when you aggregate that to the fund level, we have a very diversified portfolio,? Brown assures. Emerging Capital Partners? portfolio comprises investments in 45 out of 54 countries across Africa, and includes telecoms, commodities, and food and drink stocks, amongst others. Its assets under management surpass the $2 billion threshold. Finally, he talks about Africa?s shift towards a consumer-driven economy: ?I think what you?re seeing come through ?Brown assures- is an emerging consumer class and we?re looking to make investments that will provide good quality, well priced, goods and services into that emerging consumer class.? So, maybe, it could be time to consider investing in Africa, and helping this continent, its economy, and its people, often left behind, develop. Watch the full interview: Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By The Motley Fool in News Published: June 19, 2013 at 2:22 pm Microsoft Corporation (NASDAQ: MSFT ) recently announced Office for iOS (but not iPad), a great sign for the company?s future in cloud-based office suites ? if not for the future of the Surface. The move is part of a long-standing trend toward web- and cloud-based document software, which?Google Inc (NASDAQ: GOOG ) pioneered years ago.?Apple Inc. (NASDAQ: AAPL ) is now finally dipping more than a toe in with the unveiling of iWork in Cloud at this year?s WWDC. Let?s take a look at current cloud-based office offerings from Microsoft, Apple Inc. (NASDAQ: AAPL ), and Google, and what they mean for investors. Microsoft Corporation (NASDAQ: MSFT ) Redmond shook things up a couple of years ago with their announcement of a subscription-based Office suite. Dubbed Office 365, the program provides access to the Office suite and other Microsoft products for a variable fee per year. With the release of Microsoft Office 2013, the company went all-in, developing a Home Premium version catering to regular consumers, and an education flavor for students looking to save money. By Microsoft?s account, the suite is selling pretty well , and it?s no wonder ? the price is right, and wide adoption of Office software means it?s the standard in many organizations and industries. This cloud-based suite is one of the things Microsoft is getting right these days, and I think it?s a prescient move that will cement its place as king of the Office suite for a few more years. With a billion Office users , the company has a pretty big hill to stand on. And that?s important for a company whose Windows division has seen flat growth in the wake of the Windows 8 debacle. On the other hand, Office 365 has driven growth in its parent Microsoft Business Division, which was the company?s most profitable last quarter . While the new subscription-based model could mean lower quarterly revenue in the short-term, Microsoft is hoping it will producer bigger margins year-over-year ? and its recent moves should hearten investors who hope Redmond is right. Apple Inc. (NASDAQ: AAPL ) At its recent Worldwide Developers Conference, Apple Inc. (NASDAQ: AAPL ) announced ?iWork for iCloud,? which is a little weird, because iWork was already available (kind of) in the (i)Cloud. The suite had been languishing for years, receiving only incremental updates since the release of ??gulp ? iWork ?09. Sure, Apple Inc. (NASDAQ: AAPL ) put out versions for iPad in 2010, and pushed content to the cloud last July with the release of OS X 10.8 Mountain Lion. But these were incremental changes ? ?nothing to really compete with the Microsoft or Google juggernauts. iWork for iCloud might change that. The big difference here is browser-based editing, which puts the suite in direct competition with Google Docs for the first time. But there are a couple of reasons I think Cuptertino?s offering will still fall short. First of all, Apple Inc. (NASDAQ: AAPL ) made no mention of real-time collaboration. The ability to watch what your colleagues are typing and chat about it is one of the strongest affordances of working in the cloud. Microsoft has promised it, Google (of course) has it, but Apple doesn?t seem concerned. I think it?s a real missed opportunity. Secondly, Apple?s iCloud has been notoriously unreliable . The company famous for simple functionality has failed to live up to Steve Jobs? claim: ?It just works.? Apple will need to make substantial improvements if it wants to convince anyone that iWork in iCloud is the office suite solution they?re looking for. Of course, iWork and the rest of Apple?s software offerings provide only a small fraction of its revenue. Last quarter it made a combined $38 billion from its hardware and only $4.1 billion from software and iTunes Store sales. iWork for iCloud?s value, if it is to provide one, will be to drive sales of Apple?s hardware. We?ll have to wait and see if the new offering is any more successful than the last version. Google Inc (NASDAQ: GOOG ) I don?t need to tell you that Google Docs is popular. But I will anyway. Consulting firm Gartner was surprised to find that between 33% and 50% of cloud-based office users were on Google Docs in 2012 ? compared with 10% in 2007. That?s huge growth and a huge market share for a product competing with the one?called ?Office.? Of course, Google Docs is free (with the exception of their enterprise offerings), meaning the product produces only a little more than 1% of Google?s revenue. Still, like many of Google?s offerings, Docs is about bringing users into the Google ecosystem. Unlike Office, Google has built Docs from the ground up as an online tool, while Microsoft has had to adapt its offerings for the cloud. In some areas, Google might never be able to replicate Office. But for many businesses, Docs might be a viable option. And as working in the cloud becomes more normal, I think you?ll see more and more enterprise customers turning to Google?s solutions for their document, calendar, and email needs. Last year, Google Apps provided $1 billion in revenue for Google. That still makes up only 1.4% of the tech giant?s revenue, but I?m not the only one who expects that number to grow. The bottom line The real competition here is between Google and Microsoft. Both have full-featured cloud-based suites that provide a viable option for enterprise and small-business customers. And many regular consumers are likely to choose either Office or Google Docs for their office suite needs, even if those consumers use Apple products. In some ways, the two companies are competing for different customers. But I think Google will continue to eat into Microsoft?s cloud-based office market share. Nonetheless, Microsoft is making strong moves to solidify its position in a market where complacency can be deadly. And speaking of complacency, Apple has been slow-moving on cloud-based office solutions. One could?ve been forgiven for thinking it had simply given up the fight before this year?s WWDC, where we saw a glimmer of what might be. Still, Apple needs to make big changes before they can hope to provide a cloud-based office solution for anyone but the most dedicated fans. Steven Yenzer owns shares of Apple. The Motley Fool recommends Apple and Google. The Motley Fool owns shares of Apple, Google, and Microsoft. The article Why Apple?s Head Is in the Cloud originally appeared on Fool.com and is written by Steven Yenzer. Steven is a member of The Motley Fool Blog Network ? entries represent the personal opinion of the blogger and are not formally edited. Copyright ? 1995 ? 2013 The Motley Fool, LLC. All rights reserved. The Motley Fool has a disclosure policy . Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Published: June 4, 2013 at 9:20 am Editor?s Note: Related tickers: UniPixel Inc (NASDAQ: UNXL ) UniPixel to Feature UniBoss Touch Screen Technology at Computex Taipei in Taiwan on June 4-8, 2013 (Sys-Con) UniPixel Inc (NASDAQ: UNXL ), a provider of Performance Engineered Films? to the touch screen, flexible printed electronics, and lighting and display markets, will attend Computex Taipei in Taiwan on June 4-8, 2013, where it will showcase product samples and prototypes of its UniBoss? pro-cap, multi-touch sensor film. The company will demonstrate its 10.1? and 13.3? UniBoss prototypes, as well as meet with touch-screen customers and supply chain members. While UniBoss offers linear cost scalability from pocket-size mobile devices to large desktop displays, these two prototype form factors target the highest growth segment of the market. Uni-Pixel Stock Rating Reaffirmed by Cowen Securities (UNXL) (DailyPolitical) UniPixel Inc (NASDAQ:UNXL)?s stock had its ?outperform? rating reaffirmed by equities research analysts at Cowen Securities in a research note issued to investors on Monday, Analyst Ratings.Net reports. They currently have a $46.00 price objective on the stock. Cowen Securities? target price points to a potential upside of 202.43% from the company?s current price. A number of other firms have also recently commented on UniPixel Inc (NASDAQ:UNXL). Analysts at Zacks downgraded shares of UniPixel Inc (NASDAQ:UNXL)?from an ?outperform? rating to a ?neutral? rating in a research note to investors on Monday, May 27th. They now have a $28.20 price target on the stock. Uni-Pixel at Center of Possible Securities Fraud Claims Investigation (Benzinga) ?Build a better mousetrap,? so the saying goes, ?and the world will beat a path to your door.? Saying you built a better mousetrap, however, is not the same as actually doing it. In a press release issued Saturday, Ademi & O?Reilly, LLP, announced an investigation into possible securities fraud claims against UniPixel Inc (NASDAQ:UNXL)?that the law firm said resulted from ?inaccurate statements UniPixel Inc (NASDAQ:UNXL)?made regarding its financial performance and future prospects for the period Dec. 7, 2012 to May 30, 2013.? NASDAQ Decliners Watch List: First Solar, Inc. (NASDAQ:FSLR), Uni-Pixel, Inc. (NASDAQ:UNXL), and SolarCity Corporation (NASDAQ:SCTY) Added to Growing Stock Report?s NASDAQ Decliners Watch List. (SBWire) UniPixel Inc (NASDAQ:UNXL)?a company that delivers performance engineered films to the display, touch screen, and flexible electronics market segments in the United States is currently down (-0.66%) on 2,697,581 shares traded after Seeking Alpha Questioned Quality of Touch Mesh. UniPixel Inc (NASDAQ:UNXL)?is currently down (-65.45%) from its recent 52-week high which has prompted Growing Stock Report to add the stock to their NASDAQ Decliners Watch List. Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ]

scala> sql("SELECT freme_topic, text FROM flattened where freme_topic = '{\"anchor\":\"NOT\",\"resource\":\"http://dbpedia.org/resource/Not\"}' limit 50").collect().foreach(println) 15/09/16 14:44:53 INFO InMemoryColumnarTableScan: Predicate (freme_topic#43 = {"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"}) generates partition filter: ((freme_topic.lowerBound#721 <= {"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"}) && ({"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"} <= freme_topic.upperBound#720)) 15/09/16 14:44:53 INFO SparkContext: Starting job: collect at :20 15/09/16 14:44:53 INFO DAGScheduler: Got job 5 (collect at :20) with 1 output partitions (allowLocal=false) 15/09/16 14:44:53 INFO DAGScheduler: Final stage: ResultStage 14(collect at :20) 15/09/16 14:44:53 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 13) 15/09/16 14:44:53 INFO DAGScheduler: Missing parents: List() 15/09/16 14:44:53 INFO DAGScheduler: Submitting ResultStage 14 (MapPartitionsRDD[39] at collect at :20), which has no missing parents 15/09/16 14:44:53 INFO MemoryStore: ensureFreeSpace(45944) called with curMem=12505199278, maxMem=33339683635 15/09/16 14:44:53 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 44.9 KB, free 19.4 GB) 15/09/16 14:44:53 INFO MemoryStore: ensureFreeSpace(17259) called with curMem=12505245222, maxMem=33339683635 15/09/16 14:44:53 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 16.9 KB, free 19.4 GB) 15/09/16 14:44:53 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on localhost:58577 (size: 16.9 KB, free: 19.4 GB) 15/09/16 14:44:53 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:874 15/09/16 14:44:53 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[39] at collect at :20) 15/09/16 14:44:53 INFO TaskSchedulerImpl: Adding task set 14.0 with 1 tasks 15/09/16 14:44:53 INFO TaskSetManager: Starting task 0.0 in stage 14.0 (TID 3132, localhost, PROCESS_LOCAL, 1165 bytes) 15/09/16 14:44:53 INFO Executor: Running task 0.0 in stage 14.0 (TID 3132) 15/09/16 14:44:53 INFO BlockManager: Found block rdd_16_0 locally 15/09/16 14:44:54 INFO Executor: Finished task 0.0 in stage 14.0 (TID 3132). 172134 bytes result sent to driver 15/09/16 14:44:54 INFO TaskSetManager: Finished task 0.0 in stage 14.0 (TID 3132) in 18 ms on localhost (1/1) 15/09/16 14:44:54 INFO TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 15/09/16 14:44:54 INFO DAGScheduler: ResultStage 14 (collect at :20) finished in 0.018 s 15/09/16 14:44:54 INFO DAGScheduler: Job 5 finished: collect at :20, took 0.039973 s [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By The Motley Fool in News Published: July 3, 2013 at 9:23 am The housing market is definitely on the mend. Depending on how you want to slice the cattle, you can make a lot of money. I think though that not every investor is interested?in or?willing to take on an inordinate amount of risk. Because of this, I am going to lay out some key macroeconomic indicators, and get to the meat of the argument?as to?whether or not investors should even have a position in housing stocks. The economics, can?t ignore these Source: Ycharts The trend is your friend and the housing market is picking back up again. In certain areas of the United States, the amount of money spent on a mortgage is cheaper than the price of rent. Assuming that?the number of people employed increases and the economy continues to recover, the housing recovery should be well on its way. Source: Federal Reserve Going forward, the real?gross domestic product?is projected to grow at a 2.3% to 2.5% rate. If that is the case, investors should position themselves in housing because?the housing stocks would appreciate rapidly in a cyclical economic rebound. KB Home just announced earnings KB Home (NYSE:KBH) reported a fairly strong quarter. The company was able to increase its revenues by 73% year-over-year (this is a significant improvement; I?wonder where all the bears went on this one.) The company is continuing to recover. The company?s deliveries were up by 39% year-over-year, and the average selling price of homes?grew by around 25% year-over-year. The company?s property backlog is up by an additional 19%. Perhaps back logs are up because investors are fearful of missing out on the next leg-up in the property market. The company reported a loss of $0.04 per share versus?a year-ago period loss of $0.31 per share.?It also reported net income growth. Because revenues were up by $221 million, costs were comparatively up by $197 million. The net difference between the two was what contributed to the company?s net income. Analysts on a consensus basis were anticipating the company to report a $0.06 loss for the quarter, but KB Home (NYSE:KBH) beat analyst estimates by $0.02. Investable insights & another alternative Investors should consider buying a home. Ignore bonds and enjoy the safety of an appreciating real estate portfolio. Now I?m not saying that a home should be your only investment; I am saying that homebuilders are selling homes for ever higher prices. You want to buy on an up-trend.?The trend is your friend, after all, and it is obviously the time to own a bit of the American dream. If?owning a home is a little bit risky, however, why not consider?The Home Depot, Inc. (NYSE:HD)? The company is exposed to the housing sector through home improvement sales. After someone buys a home there?s usually a lot to fix, a lot to upgrade, and a lot to buy. Everything from gardening improvement, paint changes, pipe fixes, toilet replacement, and counter top changes can all be done at The Home Depot, Inc. (NYSE:HD). The company?s stock currently trades at a bit of a hefty valuation (with a 20.5 forward earnings multiple.)?In 2012, the company?was able to grow its earnings per share by 21.5%. The growth in earnings was driven by operating profit margins improving by 93 basis points to 10.39%. The company also repurchased $4 billion in shares, which also contributed to earnings-per-share growth. Analysts are pretty optimistic?about the company?s future. Disciplined cost management, paired with stronger macroeconomic indicators and share buybacks, will grow the company?s earnings going forward. The company?s?stock is projected to grow its earnings by 14.61% per year over the next five years. Why?not own the bank? I think that?Bank of America Corp (NYSE:BAC) could be the most well-positioned bank in terms of earnings growth (I?ll have a separate article dedicated towards the financial sector soon.) The company has a large portfolio of higher-risk securities because in all likelihood, higher-rated (safe) securities are being dumped in favor of riskier assets. The risk premium on BBB-rated bonds is 1.57 currently, which is below the long-run average of 1.867. Assuming that Bank of America Corp (NYSE:BAC) accumulated its BBB mortgages and bonds when risk premiums were above the long-run average, you can basically assume that the company is better positioned than other banks. Source: Bank of America Around 57% of the bank?s assets are below a BBB rating, which implies that the bank is less exposed to coupon note depreciation. It is assumed that the interest rates from the lower-rated securities could make up for the bank?s mark-to-market accounting losses from depreciating AAA-rated securities. After all, treasury bonds are AAA-rated assets and those are declining in value right now. What a bank should own are lower-rated securities that pay a higher rate of interest. Those higher rates of interest would make up for the depreciation on higher-quality debt. Fortunately, Bank of America has positioned itself for this already. The CEO, Brian Moynihan, also plans to cut back on spending by $8 billion by the year 2015. This is why analysts on a consensus basis anticipate?that the?company?will?grow its earnings by 23.39% per year over the next five years. The stock has 41.3 earnings ratio right now, which is reasonable when considering the projected rates of growth. Conclusion Investors need exposure to housing in their investment portfolio. Owning an actual house could be the most lucrative choice right now, but there are other options as well. The home ownership population has declined and the total number of households have gone up, so there?s a lot of pent-up demand which can be reflected in the backlog figures presented by KB Home (NYSE:KBH). Using that, as a leading indicator, we can also assume that demand for mortgages and home improvement will be up as well. Therefore, investors should consider a position in?companies such as KB Home (NYSE:KBH), The Home Depot, Inc. (NYSE:HD), and Bank of America Corp (NYSE:BAC). The article The Housing Recovery Is Offering Lucrative Investment Opportunities originally appeared on Fool.com and is written by Alexander Cho. Alexander Cho has no position in any stocks mentioned. The Motley Fool recommends Bank of America and Home Depot. The Motley Fool owns shares of Bank of America. Alexander is a member of The Motley Fool Blog Network ? entries represent the personal opinion of the blogger and are not formally edited. Copyright ? 1995 ? 2013 The Motley Fool, LLC. All rights reserved. The Motley Fool has a disclosure policy . Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By Javier Hasse in Commodities , News Published: June 9, 2014 at 11:16 am On Friday June 6, Andrew Brown, CIO at Emerging Capital Partners, was interviewed at CNBC and talked about investment opportunities with great potential in the African continent. Mr. Brown highlights that his private equity firm is a Pan-African investor, which implies that it endows businesses in the entire continent, not only in South Africa, as many assume. In fact, he states, opportunities in South Africa are less interesting that those present in the rest of the continent. In terms of where the opportunities are in the continent, most people talk about Nigeria, ?because of the young demographic and rapidly growing population? (CNBC interviewer). Mr. Brown further explains that everybody tends to focus on Nigeria because ?it?s a single country with a lot of people.? However, Emerging Capital Partners looks beyond this, and seeks to reach the same population size, delivering products and services, by endowing companies with presence in several smaller countries. He continues, ?The dynamic you?re seeing in Nigeria is a dynamic that?s playing out across Africa. The challenge is how you actually build businesses that can operate and address that market need.? When considering investing in Africa, one must take into account that, as a continent, it is growing at 5% per year, and this growth rate is accelerating. Actually, this recently resulted in the Work Bank upgrading its forecast to 6% for the continent. ? But what about the risk? Well, Mr. Brown?s job as a fund manager is to manage that risk in order to get stable returns. ?I can?t tell you there is no country or political risk across Africa, but there are certainly lots of businesses that aren?t really impacted by political risk per se. And then, when we invest, we like to build platform companies that are operating across a number of countries (?) and that provides a diversification not only at the portfolio company level, but then when you aggregate that to the fund level, we have a very diversified portfolio,? Brown assures. Emerging Capital Partners? portfolio comprises investments in 45 out of 54 countries across Africa, and includes telecoms, commodities, and food and drink stocks, amongst others. Its assets under management surpass the $2 billion threshold. Finally, he talks about Africa?s shift towards a consumer-driven economy: ?I think what you?re seeing come through ?Brown assures- is an emerging consumer class and we?re looking to make investments that will provide good quality, well priced, goods and services into that emerging consumer class.? So, maybe, it could be time to consider investing in Africa, and helping this continent, its economy, and its people, often left behind, develop. Watch the full interview: Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By The Motley Fool in News Published: June 19, 2013 at 2:22 pm Microsoft Corporation (NASDAQ: MSFT ) recently announced Office for iOS (but not iPad), a great sign for the company?s future in cloud-based office suites ? if not for the future of the Surface. The move is part of a long-standing trend toward web- and cloud-based document software, which?Google Inc (NASDAQ: GOOG ) pioneered years ago.?Apple Inc. (NASDAQ: AAPL ) is now finally dipping more than a toe in with the unveiling of iWork in Cloud at this year?s WWDC. Let?s take a look at current cloud-based office offerings from Microsoft, Apple Inc. (NASDAQ: AAPL ), and Google, and what they mean for investors. Microsoft Corporation (NASDAQ: MSFT ) Redmond shook things up a couple of years ago with their announcement of a subscription-based Office suite. Dubbed Office 365, the program provides access to the Office suite and other Microsoft products for a variable fee per year. With the release of Microsoft Office 2013, the company went all-in, developing a Home Premium version catering to regular consumers, and an education flavor for students looking to save money. By Microsoft?s account, the suite is selling pretty well , and it?s no wonder ? the price is right, and wide adoption of Office software means it?s the standard in many organizations and industries. This cloud-based suite is one of the things Microsoft is getting right these days, and I think it?s a prescient move that will cement its place as king of the Office suite for a few more years. With a billion Office users , the company has a pretty big hill to stand on. And that?s important for a company whose Windows division has seen flat growth in the wake of the Windows 8 debacle. On the other hand, Office 365 has driven growth in its parent Microsoft Business Division, which was the company?s most profitable last quarter . While the new subscription-based model could mean lower quarterly revenue in the short-term, Microsoft is hoping it will producer bigger margins year-over-year ? and its recent moves should hearten investors who hope Redmond is right. Apple Inc. (NASDAQ: AAPL ) At its recent Worldwide Developers Conference, Apple Inc. (NASDAQ: AAPL ) announced ?iWork for iCloud,? which is a little weird, because iWork was already available (kind of) in the (i)Cloud. The suite had been languishing for years, receiving only incremental updates since the release of ??gulp ? iWork ?09. Sure, Apple Inc. (NASDAQ: AAPL ) put out versions for iPad in 2010, and pushed content to the cloud last July with the release of OS X 10.8 Mountain Lion. But these were incremental changes ? ?nothing to really compete with the Microsoft or Google juggernauts. iWork for iCloud might change that. The big difference here is browser-based editing, which puts the suite in direct competition with Google Docs for the first time. But there are a couple of reasons I think Cuptertino?s offering will still fall short. First of all, Apple Inc. (NASDAQ: AAPL ) made no mention of real-time collaboration. The ability to watch what your colleagues are typing and chat about it is one of the strongest affordances of working in the cloud. Microsoft has promised it, Google (of course) has it, but Apple doesn?t seem concerned. I think it?s a real missed opportunity. Secondly, Apple?s iCloud has been notoriously unreliable . The company famous for simple functionality has failed to live up to Steve Jobs? claim: ?It just works.? Apple will need to make substantial improvements if it wants to convince anyone that iWork in iCloud is the office suite solution they?re looking for. Of course, iWork and the rest of Apple?s software offerings provide only a small fraction of its revenue. Last quarter it made a combined $38 billion from its hardware and only $4.1 billion from software and iTunes Store sales. iWork for iCloud?s value, if it is to provide one, will be to drive sales of Apple?s hardware. We?ll have to wait and see if the new offering is any more successful than the last version. Google Inc (NASDAQ: GOOG ) I don?t need to tell you that Google Docs is popular. But I will anyway. Consulting firm Gartner was surprised to find that between 33% and 50% of cloud-based office users were on Google Docs in 2012 ? compared with 10% in 2007. That?s huge growth and a huge market share for a product competing with the one?called ?Office.? Of course, Google Docs is free (with the exception of their enterprise offerings), meaning the product produces only a little more than 1% of Google?s revenue. Still, like many of Google?s offerings, Docs is about bringing users into the Google ecosystem. Unlike Office, Google has built Docs from the ground up as an online tool, while Microsoft has had to adapt its offerings for the cloud. In some areas, Google might never be able to replicate Office. But for many businesses, Docs might be a viable option. And as working in the cloud becomes more normal, I think you?ll see more and more enterprise customers turning to Google?s solutions for their document, calendar, and email needs. Last year, Google Apps provided $1 billion in revenue for Google. That still makes up only 1.4% of the tech giant?s revenue, but I?m not the only one who expects that number to grow. The bottom line The real competition here is between Google and Microsoft. Both have full-featured cloud-based suites that provide a viable option for enterprise and small-business customers. And many regular consumers are likely to choose either Office or Google Docs for their office suite needs, even if those consumers use Apple products. In some ways, the two companies are competing for different customers. But I think Google will continue to eat into Microsoft?s cloud-based office market share. Nonetheless, Microsoft is making strong moves to solidify its position in a market where complacency can be deadly. And speaking of complacency, Apple has been slow-moving on cloud-based office solutions. One could?ve been forgiven for thinking it had simply given up the fight before this year?s WWDC, where we saw a glimmer of what might be. Still, Apple needs to make big changes before they can hope to provide a cloud-based office solution for anyone but the most dedicated fans. Steven Yenzer owns shares of Apple. The Motley Fool recommends Apple and Google. The Motley Fool owns shares of Apple, Google, and Microsoft. The article Why Apple?s Head Is in the Cloud originally appeared on Fool.com and is written by Steven Yenzer. Steven is a member of The Motley Fool Blog Network ? entries represent the personal opinion of the blogger and are not formally edited. Copyright ? 1995 ? 2013 The Motley Fool, LLC. All rights reserved. The Motley Fool has a disclosure policy . Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Published: June 4, 2013 at 9:20 am Editor?s Note: Related tickers: UniPixel Inc (NASDAQ: UNXL ) UniPixel to Feature UniBoss Touch Screen Technology at Computex Taipei in Taiwan on June 4-8, 2013 (Sys-Con) UniPixel Inc (NASDAQ: UNXL ), a provider of Performance Engineered Films? to the touch screen, flexible printed electronics, and lighting and display markets, will attend Computex Taipei in Taiwan on June 4-8, 2013, where it will showcase product samples and prototypes of its UniBoss? pro-cap, multi-touch sensor film. The company will demonstrate its 10.1? and 13.3? UniBoss prototypes, as well as meet with touch-screen customers and supply chain members. While UniBoss offers linear cost scalability from pocket-size mobile devices to large desktop displays, these two prototype form factors target the highest growth segment of the market. Uni-Pixel Stock Rating Reaffirmed by Cowen Securities (UNXL) (DailyPolitical) UniPixel Inc (NASDAQ:UNXL)?s stock had its ?outperform? rating reaffirmed by equities research analysts at Cowen Securities in a research note issued to investors on Monday, Analyst Ratings.Net reports. They currently have a $46.00 price objective on the stock. Cowen Securities? target price points to a potential upside of 202.43% from the company?s current price. A number of other firms have also recently commented on UniPixel Inc (NASDAQ:UNXL). Analysts at Zacks downgraded shares of UniPixel Inc (NASDAQ:UNXL)?from an ?outperform? rating to a ?neutral? rating in a research note to investors on Monday, May 27th. They now have a $28.20 price target on the stock. Uni-Pixel at Center of Possible Securities Fraud Claims Investigation (Benzinga) ?Build a better mousetrap,? so the saying goes, ?and the world will beat a path to your door.? Saying you built a better mousetrap, however, is not the same as actually doing it. In a press release issued Saturday, Ademi & O?Reilly, LLP, announced an investigation into possible securities fraud claims against UniPixel Inc (NASDAQ:UNXL)?that the law firm said resulted from ?inaccurate statements UniPixel Inc (NASDAQ:UNXL)?made regarding its financial performance and future prospects for the period Dec. 7, 2012 to May 30, 2013.? NASDAQ Decliners Watch List: First Solar, Inc. (NASDAQ:FSLR), Uni-Pixel, Inc. (NASDAQ:UNXL), and SolarCity Corporation (NASDAQ:SCTY) Added to Growing Stock Report?s NASDAQ Decliners Watch List. (SBWire) UniPixel Inc (NASDAQ:UNXL)?a company that delivers performance engineered films to the display, touch screen, and flexible electronics market segments in the United States is currently down (-0.66%) on 2,697,581 shares traded after Seeking Alpha Questioned Quality of Touch Mesh. UniPixel Inc (NASDAQ:UNXL)?is currently down (-65.45%) from its recent 52-week high which has prompted Growing Stock Report to add the stock to their NASDAQ Decliners Watch List. Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},Published: March 27, 2013 at 4:57 pm Robert Half International Inc. (NYSE: RHI ) has experienced a decrease in hedge fund interest recently. At the moment, there are tons of metrics investors can use to watch the equity markets. A couple of the best are hedge fund and insider trading sentiment. At Insider Monkey, our studies have shown that, historically, those who follow the best picks of the best money managers can outpace the S&P 500 by a solid margin ( see just how much ). Just as key, positive insider trading sentiment is another way to break down the stock market universe. Just as you?d expect, there are many reasons for a bullish insider to cut shares of his or her company, but just one, very obvious reason why they would behave bullishly. Many empirical studies have demonstrated the useful potential of this tactic if investors know what to do ( learn more here ). Now, let?s take a glance at the key action surrounding Robert Half International Inc. (NYSE: RHI ). What have hedge funds been doing with Robert Half International Inc. (NYSE:RHI)? At year?s end, a total of 22 of the hedge funds we track were bullish in this stock, a change of -21% from the third quarter. With hedgies? capital changing hands, there exists an ?upper tier? of notable hedge fund managers who were upping their holdings meaningfully. Of the funds we track, Ken Griffin?s Citadel Investment Group had the most valuable position in Robert Half International Inc. (NYSE:RHI), worth close to $90 million, accounting for 0.1% of its total 13F portfolio. On Citadel Investment Group?s heels is Chuck Royce of Royce & Associates , with a $67 million position; 2.7% of its 13F portfolio is allocated to the company. Other peers with similar optimism include Christopher Lord?s Criterion Capital , Clint Carlson?s Carlson Capital and Alexander Mitchell?s Scopus Asset Management . Due to the fact that Robert Half International Inc. (NYSE:RHI) has faced declining sentiment from the aggregate hedge fund industry, we can see that there is a sect of hedge funds who sold off their full holdings at the end of the year. Intriguingly, SAC Subsidiary?s Sigma Capital Management said goodbye to the biggest stake of all the hedgies we track, valued at an estimated $9 million in stock.. Jeffrey Vinik?s fund, Vinik Asset Management , also said goodbye to its stock, about $8 million worth. These transactions are interesting, as total hedge fund interest dropped by 6 funds at the end of the year. What do corporate executives and insiders think about Robert Half International Inc. (NYSE:RHI)? Insider trading activity, especially when it?s bullish, is most useful when the company in focus has experienced transactions within the past six months. Over the last 180-day time period, Robert Half International Inc. (NYSE:RHI) has seen zero unique insiders purchasing, and 1 insider sales ( see the details of insider trades here ). With the returns demonstrated by our time-tested strategies, everyday investors must always monitor hedge fund and insider trading sentiment, and Robert Half International Inc. (NYSE:RHI) applies perfectly to this mantra. Insider Monkey?s small-cap strategy returned 29.2% between September 2012 and February 2013 versus 8.7% for the S&P 500 index. Try it now by clicking the link above. Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By Nate White in News , Options Published: September 30, 2014 at 9:57 am Alibaba Group Holding Ltd (NYSE:BABA)?s options are available on the market. With about 110,000 sold so far, the company enjoys a higher demand for calls than puts at a ratio of about 1.2 to 1 respectively, according to CNBC . The most interesting trade so far has been 3,000 pairs of November $85 puts and $95 calls, for $7.30 each of the contracts. This basically implies that the investor expects Alibaba Group Holding Ltd (NYSE:BABA)?s per share value to remain range-bound between $77.7 and $102.3. ?This is a yield enhancement trade likely against a long stock position assuming that volatility is going to come in, meaning they?re going to benefit from options? prices coming in, but also the stock going sideways,? informed Dan Nathan. Alibaba Group Holding Ltd (NYSE:BABA)?s price is around the $89.3 figure, showing a bump of about 0.5% during the day. We still have six weeks till predictions regarding the value per share falling in the range described by the above trader can be accurately made. So far, there?s little history from which to draw some tendencies or patterns for swings in price. Surely, one thing to notice is that the options will be cheaper as time passes and the market starts to digest better information about Alibaba Group Holding Ltd (NYSE:BABA). ?Here?s Facebook options since it went public in May of 2012. When you look at it, really pricing?s gone down. [?] But really, what you would expect of out a $200 billion market cap company is lower options prices and I think that?s what this trader saw today,? said Dan Nathan. One more interesting statistic regarding the financial contracts being traded is the fact that there were more calls than puts, which translates into investors expecting the price actually to rise. This optimism will serve Alibaba Group Holding Ltd (NYSE:BABA) well as expectations might boost the company?s stock value. Disclosure: none Free Report: Warren Buffett and 12 Billionaires Are Crazy About These 7 Stocks Let Warren Buffett, David Einhorn, George Soros, and David Tepper WORK FOR YOU. If you want to beat the low cost index funds by an average of 6 percentage points per year look no further than Warren Buffett?s stock picks. That?s the margin Buffett?s stock picks outperformed the market since 2008. In this free report, Insider Monkey?s market beating research team identified 7 stocks Warren Buffett and 12 other billionaires are crazy about. CLICK HERE NOW for all the details. Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By The Motley Fool in News Published: February 1, 2013 at 1:55 pm Along with the several American companies who are feeling the pinch of the contraction in the European economy, here comes another company who is joining the league. The U.S. based high-fidelity headphone company Koss Corporation (NASDAQ: KOSS ) has disappointed investors with their second quarter earnings. A quick look at the quarter The Milwaukee-based company has reported a decline of 13.5% in its topline and 59.3% in its bottom line in the second quarter earnings. The massive decline in the figures is due to the declining demand faced by the European distributors from their customer base. Even though sales from U.S retailers increased due to the addition of new customers, it couldn?t offset the massive loss faced by the company?s Europe division. In addition to the weak European economy, the threatened dockworkers strike on the eastern seaboard also affected sales as certain export orders were disrupted and not shipped at the end of December. Another vital reason for the increasing costs and declining earnings is the amortization of the costs made for the investment in the software development in their Striva product line for the past several years. What?s wrong in Europe? At present, Europe is facing a huge financial crisis, creating a financial uncertainty in US and slowing the country?s economic growth as European customers are buying less US products and services. And due to this sudden lag in US exports, many companies are facing a shrinking top line. ?Companies like Kraft Foods Group Inc (NASDAQ:KRFT) are facing the heat of the financial crisis in the European market as it is decreasing the purchasing power of customers. Thus there is a massive decline in their sale of candy and gum as teenagers are suffering from financial crisis, leading to a decline in their top line. Again, there are companies like Ford Motor Company (NYSE: F ) which in spite of posting strong quarterly earnings are facing difficulties in infusing investors confidence. Although Ford?reported an overall 54% jump in its fourth-quarter profit in comparison to last year, the company?s?revenue has taken a blow in the European markets as Europeans are not only buying less cars, but are also replacing fewer parts. This has forced Ford to report a guidance of $2 billion loss in the European market for the coming year. All these companies will continue to experience such a difficult situation unless the European crisis eases out, and the only way to ward off the slump in sales is to device new strategies either by new product development or entering new markets and Koss is very much on the right track. The next move When all the other US companies are creating new strategies to fight back the European crisis, Koss is also not sitting idle. The company expects to launch several new products based on the Striva technology in the next six to 12 months.? The company is now focusing on the development of quality headphones specially designed for women, and is also continuously endorsing their brands with women celebrities. After Dara Torres who was named a Koss endorser in June, Koss has now lined up another athlete endorser, Julia Mancuso. Mancuso, who is an Olympic champion skier, will now endorse Koss by wearing their headphones. ?Koss is also using the social media site Facebook to promote its product by giving fans a chance to win Koss products through its Facebook page. Foolish takeaway As immediate improvement in the economic condition in Europe is not expected, Koss might not outperform in the next financial quarter. However investors can be a bit patient as Koss is definitely fighting back. With the launch of their new products ?and Julia Mancuso promoting the company?s line of fitness and high-fidelity headphones to fans worldwide, their top line growth is expected to increase with more and more acceptance of the product among her fans and common masses. With such innovations and branding strategies Koss can really be prized stock once the European economy settles down. The article Koss Investors be Patient originally appeared on Fool.com and is written by Satarupa Bose. Copyright ? 1995 ? 2013 The Motley Fool, LLC. All rights reserved. The Motley Fool has a disclosure policy . Biotech Insider Alert - $5 Stock To Hit $40 $200 Million Dollar Healthcare Hedge Fund's #1 Best Idea Right Now The best healthcare hedge fund out there right now is one of the largest shareholders in this biotech stock. The fund returned more than 20% in each of the last 2 years with a virtually fully hedged portfolio, and it's sending out a BUY signal on this biotech stock. Get your FREE REPORT today (retail value of $300) This is a FREE report from Insider Monkey. Credit Card is NOT required. ] [{"anchor":"NOT","resource":"http://dbpedia.org/resource/Not"},By The Motley Fool in News Published: August 2, 2013 at 10:35 am The auto industry has been hit very hard over the last few years, and most investors have turned away from the once-booming sector. ?As a result, there is value to be found in the auto industry and money to be made. Additionally, with the city of Detroit claiming?bankruptcy in mid-July, the motor city has subsequently injected more value into many of the auto industry stocks by creating a discount. The stocks mentioned herein are the most undervalued stocks I c

jnehring commented 9 years ago

Attached is a list of problem texts that return - http://dbpedia.org/resource/Not

I cant see the attachment, I think you cant attach files to GitHub issues. I suggest to upload it in this GDrive folder and link to the file from the GitHub issue.

I created a new issue because of NOT being detected as entity. https://github.com/freme-project/e-Entity/issues/49

johnmcauley commented 9 years ago

Ah, ok will amend now.

On 17 September 2015 at 10:25, Jan Nehring notifications@github.com wrote:

Attached is a list of problem texts that return - http://dbpedia.org/resource/Not

I cant see the attachment, I think you cant attach files to GitHub issues. I suggest to upload it in this GDrive folder https://drive.google.com/drive/folders/0B8CeKhHCOSqUfm9aMGM0NlF0VDNFa19ldDNLX21sbE9Vc3NQX1NDdnQwYVdXZFlta0RYR28 and link to the file from the GitHub issue.

I created a new issue because of NOT being detected as entity. #49 https://github.com/freme-project/e-Entity/issues/49

— Reply to this email directly or view it on GitHub https://github.com/freme-project/e-Entity/issues/44#issuecomment-141021425 .

John McAuley

m1ci commented 9 years ago

@koidl the topics and associated weights and labels are already provided as part of the output. See http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&input=Berlin+is+in+Germany.&outformat=turtle&language=en&dataset=dbpedia&enrichement=dbpedia-categories

The categories information are not included by default. You need to add the enrichement=dbpedia-categories parameter to include also the topics as part of the output.

Any feedback is more than welcome.

We can further improve the results by:

jnehring commented 9 years ago

We should open additional issues for feedback or improvements. There is too much and too diverse content in this issue.

The first version of topic detection is implemented so I close this issue. I move the feedback task to #50