astrodataformat / usecases

Paper 2 use cases and requirements
0 stars 0 forks source link

Use Case 10: Discussion #10

Open brianthomas opened 9 years ago

brianthomas commented 9 years ago

I've tried to extract requirements from this Use Case but I fear I may have missunderstood the intent. Currently I have extracted requirements 10 and 11 from this use case (they appear different requirements to me). Please review and feedback any needed changes/problems.

migueldvb commented 9 years ago

I think that it is a good idea to separate these two requirements in use case 10. We can also say that being able to select part of a dataset is important to implement parallel I/O operations by accessing the data with independent processes. Perhaps this could be added in requirement 10 or in a new requirement that describes accessing a dataset in parallel.

brianthomas commented 9 years ago

I've tacked in your parallel I/O wording to Use Case 10, but it seems a little bit like it was bolted on. Can you write a separate Use Case about parallel access I/O, perhaps around a large dataset scenario? I worry we are missing some important aspects of this functionality and capturing requirements for large datasets in general. Note i've also added a parallel I/O requirement too.

brianthomas commented 9 years ago

Quick note that Requirement-12 : Parallel I/O Support seems to overlap Requirement 10: partial read of format. Not sure if these are really different or not. I've linked them in the wiki so that the issue is highlighted but opinions here on this matter would be good.

telegraphic commented 9 years ago

To add to the discussion, agreed that Parallel I/O is an important requirement.

While parallel I/O would be useful for a lot of things, here's a specific use case for parallel write in radio astronomy: a FX correlator breaks up the cross-correlation into frequency subbands over several compute nodes. To reconstruct the full spectrum each compute node needs to write each subband to a single file (or file-like object).

And for parallel read: a user wishes to image several subbands of a wide-bandwidth visibility dataset produced by a correlator. Data access should be parallelizable over both time and frequency, so that multiple parallel data reduction pipelines can be run at once on the same dataset.

brianthomas commented 9 years ago

@telegraphic Thank you. Those are good details, do you think you could meld them into Use Case 10?

migueldvb commented 9 years ago

I think that it makes sense to have a separate use case for parallel I/O because selecting part of a large dataset as described in Usecase 10 can have other important applications. The radio astronomy example for parallel data analysis is very good. I can write a usecase for distributed data access that will be related with the new Requirement 12, and please feel free to add more details specific to the radio astronomy case.

telegraphic commented 9 years ago

Usecase 17 is looking good!

migueldvb commented 9 years ago

Thanks @telegraphic , could you add the example of parallel data access in radio astronomy to Use Case 17?

telegraphic commented 9 years ago

Added it in, feel free to edit as required

migueldvb commented 9 years ago

Great, thank you!