NAL-i5K / tripal_eutils

ncbi loader via the eutils interface
GNU General Public License v3.0
4 stars 3 forks source link

contacts are created but are not linked to projects #245

Open dsenalik opened 1 year ago

dsenalik commented 1 year ago

Contacts are created from BioSamples and BioProjects, but are not linked to the source BioProject which should be done using the chado.project_contact table. BioSamples are linked to contacts currently, using the biosourceprovider_id column of the chado.biomaterial table.

dsenalik commented 1 year ago

After loading, you can list links that could be made with this SQL

SELECT DISTINCT BP.project_id, C.contact_id FROM biomaterial B
    LEFT JOIN chado.contact C ON B.biosourceprovider_id = C.contact_id
    LEFT JOIN biomaterial_project BP ON B.biomaterial_id=BP.biomaterial_id
    LEFT JOIN project_contact PC ON BP.project_id=PC.project_id
WHERE BP.project_id IS NOT NULL AND C.contact_ID IS NOT NULL AND PC.project_id IS NULL;

and you can make the links by adding the INSERT with

INSERT INTO project_contact (project_id, contact_id)
    SELECT DISTINCT BP.project_id, C.contact_id FROM biomaterial B
    LEFT JOIN chado.contact C ON B.biosourceprovider_id = C.contact_id
    LEFT JOIN biomaterial_project BP ON B.biomaterial_id=BP.biomaterial_id
    LEFT JOIN project_contact PC ON BP.project_id=PC.project_id
WHERE BP.project_id IS NOT NULL AND C.contact_ID IS NOT NULL AND PC.project_id IS NULL;
dsenalik commented 1 year ago

See also Issue #174 for discussion about improved contact import.