MicrosoftDocs / microsoft-academic-services

Creative Commons Attribution 4.0 International
15 stars 33 forks source link

'DataFrame' object has no attribute 'FamilyId' #117

Open LangeJustin opened 4 years ago

LangeJustin commented 4 years ago

Seems like the object Papers has no attribute FamilyID. Did you guys change the ERD? The following snippet is from the Databricks author h-index tutorial

# Get (Paper, EstimatedCitation).
# Treat papers with same FamilyId as a single paper and sum the EstimatedCitation
Papers = MAG.getDataframe('Papers')
p = Papers.where(Papers.EstimatedCitation > 0) \
  .select(F.when(Papers.FamilyId.isNull(), Papers.PaperId).otherwise(Papers.FamilyId).alias('PaperId'), \
          Papers.EstimatedCitation) \
  .alias('p')

Results in the following AttributeError:

AttributeError: 'DataFrame' object has no attribute 'FamilyId'

cyhuang01 commented 4 years ago

@LangeJustin, FamilyId field was added to MAG on 2019-06-27 release. Does the issue still exist? Please use samples/HIndexDatabricksSample.py in your MAG dataset, or samples/pyspark/HIndexSample.py in the latest MAG dataset. The python script in each MAG dataset is consistent with the MAG schema.