make sure line17-20 - Githubissues

shendongCathy commented 1 year ago

https://github.com/Droidtown/AItlas/blob/1bb20e3940582f488a29029d3c25c29e6b9ef83c/tools/ArticutDemo.py#L17

我想確定這個wikifileLIST的資料夾是不是經由person_classifer 處理過的.jason檔 2.當我執行

folder_path = "../data/wikifileLIST"
articut = Articut(username, apikey)
topicNameLIST = []
for i in folder_path:
with open(i) as f:
    topicNameLIST.append(i.replace(".json" ""))
    inputSTR = json.load(f)["abstract"]

出現以下錯誤訊息 builtins.PermissionError: [Errno 13] Permission denied: '.'

Droidtown commented 1 year ago

PermissionError 應該和有沒有被 person_classifier 處理過無關。

這段

for i in folder_path:
    ...

只是取了標題 (topicNameLIST) 還有摘要 (inputSTR)，所以也不一定要用已經過person_classifier 處理後的 .json 文件放在 wikiLIST 的。

geoffcysu commented 1 year ago

看起來這邊的問題是用錯方法讀檔，folder_path此時是一個string, for i in folder_path 的i會是字串裡的每一個character，也就是".",".","/","d","a".... 這樣就open錯東西了。

應該是要像 tools/person_classifier.py 下面寫的那樣：

import os
for file_name in os.listdir(folder_path):
  with open(folder_path+"/"+file_name) as f:
    # Do what you want with f

shendongCathy commented 1 year ago

我照著以上的方式，似乎是可以打開檔案，但碰到了其他問題

# 讀取資料夾中的每個檔案，限制為前500筆
count = 0
for filename in os.listdir(folder_path):
    if count >= 500:
        break
    file_path = os.path.join(folder_path, filename)

    with open(file_path, "r", encoding="utf-8") as f:
        data = json.load(f)
        abstract = data.get("abstract")
        if abstract:
            abstract_list.append(abstract)
            topic_name = os.path.splitext(filename)[0]  # 去掉檔案名稱中的副檔名
            topicNameLIST.append(topic_name)
            count += 1
abstract = abstract.replace(" ", "").replace("\n", "")
resultDICT = articut.parse(abstract, level="lv2")
toAddLIST = []
for i in resultDICT["result_pos"]:
    if "<ACTION_verb>經歷</ACTION_verb>" in i:
        i = re.sub(pronounDropPat, "", i)
        i = re.sub(innerDropPat, "", i)
        toAddLIST.append(re.sub(purgePat, "", i))

在for i in resultDICT["result_pos"]:

出現了 builtins.TypeError: 'Response' object is not subscriptable

shendongCathy commented 1 year ago

我照著以上的方式，似乎是可以打開檔案，但碰到了其他問題

# 讀取資料夾中的每個檔案，限制為前500筆
count = 0
for filename in os.listdir(folder_path):
    if count >= 500:
        break
    file_path = os.path.join(folder_path, filename)

    with open(file_path, "r", encoding="utf-8") as f:
        data = json.load(f)
        abstract = data.get("abstract")
        if abstract:
            abstract_list.append(abstract)
            topic_name = os.path.splitext(filename)[0]  # 去掉檔案名稱中的副檔名
            topicNameLIST.append(topic_name)
            count += 1
abstract = abstract.replace(" ", "").replace("\n", "")
resultDICT = articut.parse(abstract, level="lv2")
toAddLIST = []
for i in resultDICT["result_pos"]:
    if "<ACTION_verb>經歷</ACTION_verb>" in i:
        i = re.sub(pronounDropPat, "", i)
        i = re.sub(innerDropPat, "", i)
        toAddLIST.append(re.sub(purgePat, "", i))

在for i in resultDICT["result_pos"]:

出現了 builtins.TypeError: 'Response' object is not subscriptable

geoffcysu commented 1 year ago

好奇怪，resultDICT["result_pos"]好像應該是個list，照error寫的變成是一個Response。不知道發生這個錯誤的時候，abstract是什麼東西？

Droidtown commented 1 year ago

Ah, 這個問題剛剛已經解開了。他送了個空的字串上去…

Droidtown / AItlas

make sure line17-20 #1