Closed mshadbolt closed 5 years ago
Some detective work by Rolando and Rodrey figured out that the spreadsheet had over one million empty rows which was causing the server to crash. So uploading a spreadsheet without these rows should be okay.
@parthshahva got in contact about this dataset as @hannes-ucsc is concerned that the structure might cause issues for the browser so I will hold on trying to submit in prod for now. I am still hopeful I can submit early next week.
@hannes-ucsc did you manage to see whether this dataset will cause issues for azul/data browser
I looked at several bundles in that project and they all have cell suspensions linked to sequence file via sequencing process. This doesn't match the description above:
Sequence files from Bulk RNA and Whole Genome Sequencing are linked straight from specimen from organism, rather than from cell suspension as all experiments up to now have been
What am I missing?
Can someone check the file counts on the DB for that project? What's the expected bundle count? Is there a mix of bundles, like some with the weird linking and some without? If so, what's a bunde FQID for such a bundle?
The counts there seem correct, apart from the file count, but I guess that is some combination of the number of json files we create and duplicating files in bundles.
Here is the list of bundles that don't have cell suspensions: bundles_no_cell_suspension.txt
example of one: curl -X GET "https://dss.staging.data.humancellatlas.org/v1/bundles/041e4dc6-181b-48f4-9e13-0cf428ecce09?replica=aws&per_page=500" -H "accept: application/json"
The counts there seem correct, apart from the file count, but I guess that is some combination of the number of json files we create and duplicating files in bundles.
DB displays the number of data files. It should represent reality, i.e. how many file were actually submitted. Is there a way to come up with a matching number on Ingest's side?
Here is the list of bundles that don't have cell suspensions: bundles_no_cell_suspension.txt
All of those bundles were indexed.
The specimen in that bundle (biomaterial_id 367C72hOesophagusBulkRNA) shows up in the DB's samples tab. I think that's a strong indicator that all bundles of that shape were indexed correctly the same way.
DB displays the number of data files. It should represent reality, i.e. how many file were actually submitted. Is there a way to come up with a matching number on Ingest's side?
I personally uploaded 370 files for the primary submission, which were fastqs and protocol documents so I guess the extras are the files that the analysis pipelines generate, including bams and the qc metrics etc. I'm not sure if there is a way of querying that from ingest.
Looking at the manifest all the analysed bundles get an extra 7 files generated but not sure which ones azul counts or what is considered a 'data file'.
Using the file type facet in the data browser we can ascertain that there are 361 FASTQs, 7 PDFs and one DOC. The FASTQs are what you uploaded. Looking at the 7 PDFs and one DOC, they appear to be the protocol documentation you are referring to. There are 67 CSVs and looking at their file names all but one are submitted by analysis. The one named Tissue_Stability_CBTM_donor_information.csv
is probably the one file missing (361+7+1+1 = 370). According to the tracker, Analysis didn't process all bundles so I think it would be premature to research whether we indexed the secondary bundles. But again, we have no failures for that project in staging, so I am going to call this one confirmed to be working.
[edit: I had omitted one DOC and the counts didn't add up, fixed now]
@HumanCellAtlas/ingest I will be attempting to submit this in prod today.
I have started with trying to submit in staging.
The submission is here: https://ui.ingest.staging.data.humancellatlas.org/submissions/detail/5dc3d80ec338f50008eccc08/overview
First issue is that one of the library_preparation_protocols is failing to validate (https://api.ingest.staging.data.humancellatlas.org/protocols/5dc3d813c338f50008ecccf7)
Can someone take a look at why it is stuck in 'validating' status?
Thanks
@aaclan-ebi as discussed earlier, I accidently uploaded one file that is not in the spreadsheet, are you able to please delete this file: https://api.ingest.staging.data.humancellatlas.org/files/5dc3d994c338f50008eccfd5
Everything is now valid so will submit in staging
The submission seemed to work successfully so I will now upload the spreadsheet and data file to prod.
Just to note what happened. The extra file manually deleted https://api.ingest.staging.data.humancellatlas.org/files/5dc3d994c338f50008eccfd5 by setting the file to validating > valid (so state tracker would be notified) and deleting it.
Looks like there's an issue with ontology validation when the ontology value couldn't be found. We released ontology service version 1.0.11 in staging and prod and redeployed validator (to clear the cache) for ontology validation to work. After that we retriggered the ontology validation by setting the protocol metadata to Draft state. It's now valid. https://api.ingest.staging.data.humancellatlas.org/protocols/5dc3d813c338f50008ecccf7
I have begun submitting to prod here: https://ui.ingest.data.humancellatlas.org/submissions/detail/5dc4081771fe4a0008e54859/overview
I created a tombstoning ticket for the old submission here: https://github.com/HumanCellAtlas/data-store/issues/2567
@MightyAx as I will be away for the next few days would it be possible for you to keep an eye on when the project is tombstoned and proceed with the uuid swap and submission?
The current uuid that we want to maintain is: c4077b3c-5c98-4d26-a614-246d12c2e5d7
db.project.find({_id: ObjectId("...")},{"uuid.uuid":1})
db.project.updateOne({_id: ObjectId("...")}, { $set: {"uuid.uuid": BinData(3,"...") } } )
db.bundleManifest.updateMany({ "fileProjectMap": { "...": [ "..." ] } }, { $set: { "fileProjectMap": { "...": [ "...." ] } } })
Confirming that all files are valid in prod so will be ready to submit once old project is tombstoned and uuid is swapped. Thanks very muchly in advance to @MightyAx and @aaclan-ebi for you help with getting this one through
@mshadbolt what is the staging project uuid? I see a few different possibilities
@jahilton Marion is now Out Of Office. She submitted to staging today so this project must be the one: Project 259f9041-b72f-45ce-894d-b645add2e620 Submission 671a0817-d5d3-4ec0-a730-70b76c13581d
[edit: it helps to include the UUID in the link label, not just the link URL itself (@hannes-ucsc )]
Given that both submissions are equal (Same amount of bundles generated) and that bundles were generated at around 12pm (e.g. https://dss.staging.data.humancellatlas.org/v1/bundles/d27901c6-f6ac-4b39-a1fd-a1fb49b507d1/?replica=aws&version2019-11-07T115502.445801Z) I would give all my pennies to the submission that @MightyAx is pointing to
@mshadbolt @hannes-ucsc
In the staging submission, 1 of the bundles was not able to be indexed by Azul (see tracker). From the browser page, 359 rather than 361 fastq files are present, 367 total files rather than 370(?).
We need to make sure this won't happen for the prod submission.
pinging @ESapenaVentura, as Marion is now on her way to HCA Asia.
I have no idea what is happening here. Do we know which bundle is missing? That might shed some light
From my manual search, I think it is this bundle 4c8aab19-9a12-4a77-ab7b-8a93bb109b76 that was not indexed.
https://api.ingest.staging.data.humancellatlas.org/bundleManifests/5dc40662c338f50008ecd80d
Azul never got a notification for bundle 4c8aab19-9a12-4a77-ab7b-8a93bb109b76. This could be related to a problem in DSS that I recently brought up with them. The signature traceback for this problem is
[ERROR] 2019-11-07T11:56:43.43Z b6de2a13-1216-5673-bcec-7c2d1ab45754 Error occurred while processing subscription 5152c2b5-c866-4cd3-aa0e-aec87cb88b4d for bundle 4c8aab19-9a12-4a77-ab7b-8a93bb109b76.2019-11-07T115502.447632Z.
Traceback (most recent call last):
File "/opt/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 421, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/opt/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 416, in _make_request
httplib_response = conn.getresponse()
File "/var/lang/lib/python3.6/http/client.py", line 1346, in getresponse
response.begin()
File "/var/lang/lib/python3.6/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/var/lang/lib/python3.6/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/var/lang/lib/python3.6/socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "/var/lang/lib/python3.6/ssl.py", line 1012, in recv_into
return self.read(nbytes, buffer)
File "/var/lang/lib/python3.6/ssl.py", line 874, in read
return self._sslobj.read(len, buffer)
File "/var/lang/lib/python3.6/ssl.py", line 631, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/python/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/opt/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/opt/python/lib/python3.6/site-packages/urllib3/util/retry.py", line 400, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/python/lib/python3.6/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/opt/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/opt/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 423, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/opt/python/lib/python3.6/site-packages/urllib3/connectionpool.py", line 331, in _raise_timeout
self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: ChunkingHTTPSConnectionPool(host='search-dss-index-staging-bobpbiduntwlsh2yllwchsiypy.us-east-1.es.amazonaws.com', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/python/lib/python3.6/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
response = self.session.send(prepared_request, **send_kwargs)
File "/opt/python/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/opt/python/lib/python3.6/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: ChunkingHTTPSConnectionPool(host='search-dss-index-staging-bobpbiduntwlsh2yllwchsiypy.us-east-1.es.amazonaws.com', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/domovoilib/dss/index/es/backend.py", line 82, in _notify_subscribers
subscription = self._get_subscription(bundle, subscription_id)
File "/var/task/domovoilib/dss/index/es/backend.py", line 99, in _get_subscription
body=subscription_query)
File "/opt/python/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/opt/python/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 632, in search
doc_type, '_search'), params=params, body=body)
File "/opt/python/lib/python3.6/site-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/opt/python/lib/python3.6/site-packages/elasticsearch/connection/http_requests.py", line 84, in perform_request
raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeout(ChunkingHTTPSConnectionPool(host='search-dss-index-staging-bobpbiduntwlsh2yllwchsiypy.us-east-1.es.amazonaws.com', port=443): Read timed out. (read timeout=10))
BTW: Who is @jlzamanian ? There is no real name on the GH profile.
BTW: Who is @jlzamanian ? There is no real name on the GH profile.
Jennifer Zamanian, Stanford Data Operations I've added my name to my GH profile.
Duh! Sorry, Jennifer.
Looping in @DailyDreaming from the DSS. Lon, could you link this to the ticket under which you are tracking the notification loss issue?
@hannes-ucsc @DailyDreaming Is the issue being discussed something that is intermittent? Does it occur in prod as well as staging? Just trying to figure out if this is something that will definitely occur if I try submission in prod or not. Also can I submit in prod anyway and be fixed manually by azul/DSS afterwards?
@MightyAx it looks like the project was tombstoned so it would be great if you could do the uuid swap and submit as soon as possible as the project has now disappeared from the browser and I would like to see it back up
Target Project UUID: c4077b3c-5c98-4d26-a614-246d12c2e5d7 Target Binary: BinData(3,"Jk2YXDx7B8TX5cISbSQUpg==")
old submission: 02e89f20-84c8-4daa-aaeb-80f4a85733ff old project id 5cdc5ab7d96dad000859cec1 old project replacement uuid 2f406bf2-b2b5-4f2c-a009-feb4686fc4f0 Bundle manifests updated: 21
new submission fd52efcc-6924-4c8a-b68c-a299aea1d80f new project id 5dc4081c71fe4a0008e5485b
@mshadbolt @MightyAx it would be good to understand why the tombstoning happened significantly before the UUID redirect so we can avoid this scenario from happening again
The new submission fd52efcc-6924-4c8a-b68c-a299aea1d80f has had it's project UUID replaced with the intended and has been submitted, currently processing.
@lauraclarke the main reason for this was because dataops put a halt to tombstoning due to the azul indexing error above. By the time they gave the go ahead and the tombstoning was complete it was already the weekend in UK which meant it was unavailable all weekend.
I agree this isn't ideal and would advocate for at least some kind of placeholder page in between. Ideally that would look like the existing project page just without the data, but don't know how difficult that would be.
Given the tombstoning is done by data store in california and the uuid swap needs to be done by an ingest dev at ebi there is always likely to be some kind of gap but agree it would be better to have this better coordinated somehow.
thanks for the summary @mshadbolt sounds pre-planning handovers and figuring out if there are sensible placeholders before we do this again would be good
Sorry about this. I was trying to coordinate so that the tombstoning would happen early this week, but there was a miscommunication. Pre-planning would make things go more smoothly.
As a data wrangler I need to submit the Meyer dataset (https://github.com/HumanCellAtlas/hca-data-wrangling/issues/86) in prod, but given the issues that the spreadsheet caused, crashing both staging and integration I thought I would make a ticket to coordinate between ingest devs and myself as to what needs to be done in order to submit the dataset.
Background:
This dataset has some unique features that may be causing problems with the linking for ingest:
It is not yet known if ingest is struggling with the fact that there are sequencing files linked to different biomaterial types, or because the linking between donors and specimens is complex.
@rdgoite has been working with re-configuring the servers to ensure that the servers don't fall over when they encounter complicated linking but ingest are still to figure out the exact cause.
The project is now submitted in staging: https://staging.data.humancellatlas.org/explore/projects/bc2229e7-e330-435a-8c2a-4275741f2c2d It exported the correct number of bundles and linking appears to be correct It was not picked up by the staging tracker: https://tracker.staging.data.humancellatlas.org/
Now we need to figure out what needs to be done before I am able to submit to prod to ensure servers don't crash.