ben-garside / XCP

XML Content Pipelining
0 stars 0 forks source link

Skipping data validation for manually included items #120

Open VermaBSI opened 8 years ago

VermaBSI commented 8 years ago

Follow on from issue #114:

We need to decide how XCP will successfully include items whose third-party origin means their metadata is often lacking in DWH which prevents usual data validation for pipeline allocation.

VermaBSI commented 8 years ago

Hi, I just tried adding these FDA UPIs (which I will need to send next through Live):

30340098 30340099 30340100 30340101 30340102 30340103 30340104 30340105 30340106 30340107

They were all successfully added as XCP items to the pipeline, etc. I designated, and with correct title and metadata - brilliant!

Only problem is the system didn't manage to collate any PDF in each case (I did select 'Collate') - and I think it's because the PDFs I supplied Data Enrichment were not deposited in the PDF hot folders, so would not be found by XCP? I was talking to Wojtek who explained they aren't doing that - and if they were, these third-party PDFs would probably upload to a Failure folder on a daily basis to be dealt with by Moira's team. Ben you'd know more on all this, so leave it to you suggest how we could get the PDFs into XCP as collations for sending into Innodata?

VermaBSI commented 8 years ago

Also, I just tried adding fully data-valid UPIs (30152882, 30152866), therefore selecting 0-Automatic collation in the dropdown for Pipeline selection, but when you click Submit, it complains you haven't selected anything in that dropdown,

NB. Don't forget we still want the automatic rule-based assignment for manually added UPIs that are proper Projects in SAP, and so should pass all of XCPs Data validation.

ben-garside commented 8 years ago

The automatic collation selection should now work after small fix to the HTML.

As for the collation of the content manually included above. XCP uses the PDF archive as its source to collate content. If the files are not in there it cannot collate them, the files will have to be placed in the archive and then the following day (after being indexed by COLOS) they will be available to collate.

VermaBSI commented 8 years ago

So do I need to instruct Data enrichment to deposit the PDFs in the PDF archive? IT still haven't setup my access to that following a request I put in over a year ago - and even then, that was only 'Read Only' access so I doubt I would be able to add PDFs in..

ben-garside commented 8 years ago

I dont know what need to be done in way of who does what and when in regards to the processing of the content, just that the files need to be in the archive to be picked up by XCP. Sorry i cant really help further.

VermaBSI commented 8 years ago

Okay, I'll try and find out (I guess Adam would've probably known...):

Have just spoken to Wojtek - we've organised that Data Enrichment will get Operations Production to deposit any third-party PDFs I email Data Enrichment about into PDF Archive. Once I know those FDA UPIs listed above are there I'll run a re-collate function on them in XCP.

VermaBSI commented 8 years ago

Just tried adding those fully data-valid UPIs (30152882, 30152866) again using Automatic allocation, this time the result for both was 'Stream ID could not be determined (90)' (so they have not been successfully added as XCP items).

This is strange, because they were successfully added in the Live version of XCP - albeit as pipeline 2s, and not pipeline 4s, which were expected because these are ISO-based standards that should have ISO XML available. I tried the re-collate function in Live on these a couple of times (once yesterday, once just now), but they are not getting re-collated as pipeline 4s.

Please could you take a look. Ideally UAT version should be adding them in as pipeline 4s assuming there is ISO XML available.

ben-garside commented 8 years ago

Thanks for that, i have fixed the issue regarding allocation, and tested on 30323572 and it collated the content OK.

VermaBSI commented 8 years ago

Hi, thanks - i did the same for ISO-based UPI 30310638, and got a successful p2 collation.

Do you not think however that this and 30323572 that you tested should be p4s, since they are ISO based?

Same issue with the original ones I tested: XCP items 30152882 (XCP6694030) and 30152866 (XCP2955627) - and what I meant was that we will need to send these actual UPIs for real once this is all pushed to Live so it would be good to try and get these right in UAT.

VermaBSI commented 8 years ago

Just realised as well that, if manual inclusion for a UPI fails for any reason using Automatic allocation (e.g. 'Stream ID could not be determined (90)', etc.) then it will be a problem if we cannot re-include that UPI again using manual pipeline selection, because that UPI already exists in XCP.

If it fails inclusion for any reason and through any way (automatic allocation or manual allocation) then logically it should NOT exist in XCP at all.

VermaBSI commented 8 years ago

I am told those FDA UPIs listed above are in PDF Archive now, but re-running the re-collate function on them in XCP doesn't produce any result (PDF collation). I presume it's because re-collate is based on the automatic collation functionality, and these were manually allocated so cannot be re-collated automatically based on the data rules.

Please advise best on how to get PDF collations for these manually added items already in XCP. Since they don't yet exist in Live XCP, there's a chance their fresh inclusion in Live XCP could all work now they're in PDF Archive...but it would still be good to have a way to get PDFs for manually added items that didn't collate upon inclusion for whatever reason...

ben-garside commented 8 years ago

Try again today via the re-collate button.

and then the following day (after being indexed by COLOS) they will be available to collate

VermaBSI commented 8 years ago

I pressed re-collate on each of these FDA UPIs just now, no results again in all cases. Let's see what happens tomorrow (do I need to press re-collate again tomorrow?)

Could I also ask your opinion on my comment about not assigning as p4s above?

ben-garside commented 8 years ago

I have had a look and the files have been placed in the archive incorrectly, who did you say put them in for you? They should have a suffix to state what file variant they are, currently i see just UPI.pdf

VermaBSI commented 8 years ago

Aha...that would have been Operations Production, but my fault really as I specified for them to be deposited as they were without any addition of -VOR which is needed I presume.

Could you add these suffixes to save time me re-naming them and re-instructing their deposit (I don't have access to PDF Archive)?

ben-garside commented 8 years ago

I have asked them to fix this. If they do it today then the collation should be doable tomorrow after 8/9AM

VermaBSI commented 8 years ago

Great thanks I'll try first thing tomorrow. Just so you know though, talking with Matt H, we will need a release of UAT - as it is - to Live tomorrow so I can get a Sentinel send to Innodata done before I am away next week, so this issue needn't be a barrier if it isn't perfect come tomorrow. I will send an email officially asking you to release to Live later today.

Also, any thoughts on my comment about not assigning as p4s above?

VermaBSI commented 8 years ago

Just tried the re-collation function on those 10 FDA UPIs and they've all worked :P - all PDFs now collated.

So these should manually add without any problem once you've pushed UAT to Live today, and fingers crossed we can get them into Sentinel :)

(I am not going to close this yet because there are still some open questions above around ensuring an item does not exist in the system if it fails manual inclusion for any reason, and also p4 assignment.)

VermaBSI commented 8 years ago

Just realised another problem with this when using in the version just pushed to Live: if a UPI already exists as an XCP item in the system following manual inclusion, you then cannot manually add the same UPI again assigned to pipeline 8, because it complains the UPI already exists in the system.

I know we were discussing the removal of pipeline 8s in the system, but so long as pipeline 8 continues to exist as a stream in XCP, we need a way of adding the sister pipeline 8 item for any manually added pipeline 2-7.

VermaBSI commented 8 years ago

Just adding Matt @hendersm