datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.88k stars 2.93k forks source link

CorpUsers: get_all and finder returns empty while get single user works ok #2692

Closed amorskoy closed 2 years ago

amorskoy commented 3 years ago

Describe the bug I can not search and retrieve all users in 0.8.1. Get single corpuser work fine

To Reproduce Steps to reproduce the behavior:

  1. Register users:
    
    from datahub.metadata.com.linkedin.pegasus2avro.common import TagAssociation, GlobalTags
    from datahub.metadata.com.linkedin.pegasus2avro.identity import CorpUserInfo
    from datahub.metadata.com.linkedin.pegasus2avro.metadata.snapshot import CorpUserSnapshot
    from datahub.metadata.com.linkedin.pegasus2avro.mxe import MetadataChangeEvent

from poc.datahub.demo.common.datahub_client import DataHubClient

TAG_AUTOMATION = TagAssociation("urn:li:tag:automation") TAG_PERSON = TagAssociation("urn:li:tag:person")

def _make_user_mce(user_urn, email, full_name, tag): mce = MetadataChangeEvent( proposedSnapshot=CorpUserSnapshot( urn=user_urn, aspects=[ CorpUserInfo( active=True, email=email, fullName=full_name ), GlobalTags( tags=[tag] ), ], ) )

return mce

def register_users(): user_andrey_urn = "urn:li:corpuser:andrey" user_etl_urn = "urn:li:corpuser:etl" andrey_mce = _make_user_mce(user_andrey_urn, "andrey@company.com", "Andrey The Scientist", TAG_PERSON) etl_mce = _make_user_mce(user_etl_urn, "etl@company.com", "ETL actor", TAG_AUTOMATION)

emitter = DataHubClient("http://localhost:8080")

emitter.emit_mce(andrey_mce)
emitter.emit_mce(etl_mce)

2. Get single user - works ok:

curl 'http://localhost:8080/corpUsers/($params:(),name:etl)' -H 'X-RestLi-Protocol-Version:2.0.0' -s | jq


3. Get ALL - returns empty

curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get_all' 'http://localhost:8080/corpUsers' | jq

{ "elements": [], "paging": { "count": 10, "start": 0, "links": [] } }


4. Search return empty

curl "http://localhost:8080/corpUsers?q=search&input=*&" -X GET -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'X-RestLi-Method: finder' | jq

{ "metadata": { "urns": [], "searchResultMetadatas": [] }, "elements": [], "paging": { "count": 10, "start": 0, "total": 0, "links": [] } }



**Expected behavior**
2 users should be found

Release version 0.8.1
amorskoy commented 3 years ago

I have discovered, that unlike other entities in my instance, corpUser is empty in both ElasticSearch (no data in index) and Neo4j (no corpuser nodes)

amorskoy commented 3 years ago

Seems that Bug is resolved, if I ingest CorpUser specifying URN in both places inside proposedSnapshot:

  1. CorpUserSnapshot.urn
  2. CorpUserSnapshot.aspects[CorpUserKey].username

If I miss p.2 - entity is not ingested into ElasticSearch, as username is the only @Searchable there. Intuitively I would expect, that CorpUserKey aspect is not needed, as I am already specifying urn in snapshot.

Does anybody know why do we preform p.1 and p.2 at the same time?

anshbansal commented 2 years ago

Closing due to inactivity. Please open new issue if issue persists with latest releases and updated docs. Please reach out on datahub slack for questions.