dbcls / humandbs

0 stars 0 forks source link

"USERS (Controlled-Access Data)"の要素と値が不一致 #2

Open tfuji opened 2 months ago

tfuji commented 2 months ago

json_from_joomla/humandb_20231223_both.json "Period of Data Use"に"Data in Use (Dataset ID)" の値が含まれている

      "USERS (Controlled-Access Data)": {
        "Mark Daly": {
          "Principal Investigator": "Mark Daly",
          "Affiliation": "Broad Institute of MIT and Harvard",
          "Research Title": "",
          "Data in Use (Dataset ID)": "",
          "Period of Data Use": "JGAD000101, JGAD000102, JGAD000123, JGAD000124, JGAD000144-JGAD000201, JGAD000220"
        },
tfuji commented 2 months ago

@mitsuhashi @skwsm 以下で全件の確認ができます。修正よろしくお願いします。 https://github.com/dbcls/humandbs/tree/dev?tab=readme-ov-file#users-controlled-access-data

mitsuhashi commented 2 months ago

@tfuji @skwsm お疲れ様です。

スクレイピング結果のJSONを見ると、スクリプトがCountry/Region列を想定していないように見えます。その右側では列名と値の対応がずれているようです。

dbcls3284:json_from_joomla mitsuhashi$ git branch
  import_json
* import_json_skwsm
  main
dbcls3284:json_from_joomla mitsuhashi$ grep 'Country' humandb_20231223_both.json  | head -10

Joomla!のhtmlを確認しましたが、列名と値の対応に問題はないと思います。

hum00355-v1

https://humandbs.dbcls.jp/en/hum0355-v1

<p>&nbsp;</p>
<h1><span style="text-decoration: underline; font-family: helvetica; font-size: 15pt;"><strong>USRES (Controlled-access Data)</strong></span></h1>
<table class="table-style style-greystripes" style="width: 922px; height: 70px;">
<thead>
<tr><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Principal Investigator</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Affiliation</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Country/Region</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Research Title</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Data in Use (Dataset ID)</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Period of Data Use</span></th></tr>
</thead>
<tbody>
<tr>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Maher Eamonn</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">University of Cambridge</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">United Kingdom of Great Britain and Northern Ireland</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Molecular Pathology of Human Genetic Disease</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">JGAD000663</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">2023/03/19-2024/07/20</span></td>
</tr>
</tbody>
</table>

hum00327-v1

https://humandbs.dbcls.jp/en/hum0327-v1

<p>&nbsp;</p>
<h1><span style="text-decoration: underline; font-family: helvetica; font-size: 15pt;"><strong>USRES (Controlled-access Data)</strong></span></h1>
<table class="table-style style-greystripes" style="width: 922px; height: 70px;">
<thead>
<tr><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Principal Investigator</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Affiliation</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Country/Region</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Research Title</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Data in Use (Dataset ID)</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Period of Data Use</span></th></tr>
</thead>
<tbody>
<tr>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Michiaki Hamada</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Faculty of Science and Engineering, Waseda University</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Japan</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Construction of RNA-targeted Drug Discovery Database</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">JGAD000624</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">2022/12/26-2025/03/31</span></td>
</tr>
</tbody>
</table> 

hum00320-v1

https://humandbs.dbcls.jp/en/hum0320-v1

<p>&nbsp;</p>
<h1><span style="text-decoration: underline; font-family: helvetica; font-size: 15pt;"><strong>USRES (Controlled-access Data)</strong></span></h1>
<table class="table-style style-greystripes" style="width: 922px; height: 70px;">
<thead>
<tr><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Principal Investigator</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Affiliation</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Country/Region</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Research Title</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Data in Use (Dataset ID)</span></th><th align="center"><span style="font-family: helvetica; font-size: 11pt;">Period of Data Use</span></th></tr>
</thead>
<tbody>
<tr>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Ansuman Satpathy</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Department of Pathology, Stanford University</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">United States of America</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">Epigenetics of Inflammatory Skin Disorders</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">JGAD000597</span></td>
<td><span style="font-family: helvetica; font-size: 10pt; line-height: normal;">2022/07/04-2023/05/31</span></td>
</tr>
</tbody>
</table>