dtinit / data-transfer-project

The Data Transfer Project makes it easy for platforms to build interoperable user data portability features. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.
https://dtinit.org/docs/dtp-what-is-it
Apache License 2.0
3.57k stars 483 forks source link

Use FileStore in FlickrPhotosImporter #679

Open seehamrun opened 5 years ago

seehamrun commented 5 years ago

similar to #612 - looks like the FlickrPhotosImporter still uses datastore and if someone has too many photos or albums, this causes an exception.

com.google.cloud.datastore.DatastoreException: The value of property "org.datatransferproject.spi.transfer.types.TempPhotosData" is longer than 1500 bytes.
I0318 09:13:29.088563   at com.google.cloud.datastore.spi.v1.HttpDatastoreRpc.translate(HttpDatastoreRpc.java:128)
I0318 09:13:29.088890   at com.google.cloud.datastore.spi.v1.HttpDatastoreRpc.commit(HttpDatastoreRpc.java:154)
I0318 09:13:29.088898   at com.google.cloud.datastore.DatastoreImpl$4.call(DatastoreImpl.java:496)
I0318 09:13:29.088903   at com.google.cloud.datastore.DatastoreImpl$4.call(DatastoreImpl.java:493)
I0318 09:13:29.088905   at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
I0318 09:13:29.088916   at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
I0318 09:13:29.088962   at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
I0318 09:13:29.088999   at com.google.cloud.datastore.DatastoreImpl.commit(DatastoreImpl.java:492)
I0318 09:13:29.089147   at com.google.cloud.datastore.TransactionImpl.commit(TransactionImpl.java:108)
I0318 09:13:29.089154   at com.google.dataliberation.transfer.worker.extensions.cloud.T2XJobStore.update(T2XJobStore.java:274)
I0318 09:13:29.089234   at org.datatransferproject.datatransfer.flickr.photos.FlickrPhotosImporter.importItem(FlickrPhotosImporter.java:108)
I0318 09:13:29.089242   at org.datatransferproject.datatransfer.flickr.photos.FlickrPhotosImporter.importItem(FlickrPhotosImporter.java:49)
I0318 09:13:29.089349   at org.datatransferproject.transfer.CallableImporter.call(CallableImporter.java:48)
I0318 09:13:29.089420   at org.datatransferproject.transfer.CallableImporter.call(CallableImporter.java:30)
I0318 09:13:29.089590   at org.datatransferproject.types.transfer.retry.RetryingCallable.call(RetryingCallable.java:66)
I0318 09:13:29.089652   at org.datatransferproject.transfer.PortabilityInMemoryDataCopier.copyHelper(PortabilityInMemoryDataCopier.java:121)
I0318 09:13:29.089668   at org.datatransferproject.transfer.PortabilityInMemoryDataCopier.copy(PortabilityInMemoryDataCopier.java:71)
I0318 09:13:29.089676   at org.datatransferproject.transfer.JobProcessor.processJob(JobProcessor.java:109)
I0318 09:13:29.089677   at org.datatransferproject.transfer.Worker.doWork(Worker.java:33)
I0318 09:13:29.089679   at org.datatransferproject.transfer.WorkerMain.poll(WorkerMain.java:142)
I0318 09:13:29.089723   at org.datatransferproject.transfer.WorkerMain.main(WorkerMain.java:68)
I0318 09:13:29.092681 Caused by: com.google.datastore.v1.client.DatastoreException: The value of property "org.datatransferproject.spi.transfer.types.TempPhotosData" is longer than 1500 bytes., code=INVALID_ARGUMENT
I0318 09:13:29.092772   at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:226)
I0318 09:13:29.092778   at com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:279)
I0318 09:13:29.092781   at com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:186)
I0318 09:13:29.092784   at com.google.datastore.v1.client.Datastore.commit(Datastore.java:87)
I0318 09:13:29.092787   at com.google.cloud.datastore.spi.v1.HttpDatastoreRpc.commit(HttpDatastoreRpc.java:152)
seehamrun commented 5 years ago

Looks like there are several places where we use Jobstore.create(UUID jobId, String Key, T model) instead of JobStore.Create(UUID jobId, String key, InputStream stream) - in the GoogleJobStore, the first uses datastore (resulting in the above exception) and the second uses filestore, we should probably just modify the first to use the correct thing

Example is MicrosoftPhotosImporter.To make the switch we would need each DataModel type to support conversion to and from InputStream (we do this manually in the GooglePhotosExporter in converJsonToInputStream() and use the ObjectMapper to convert from the stream to the TempPhotosData)

seehamrun commented 5 years ago

Also related to #623