asreview / synergy-dataset

SYNERGY - Open machine learning dataset on study selection in systematic reviews
Creative Commons Zero v1.0 Universal
62 stars 27 forks source link

Add test for datasets #70

Closed J535D165 closed 3 years ago

J535D165 commented 3 years ago

This PR proposes to add a test for each dataset. Datasets requirements are checked on each PR.

J535D165 commented 3 years ago

Hey @terrymyc, the software datasets do have a strange column named title_clean. The values are like this: sinogrambasedmotioncorrectionofpetimagesusingopticalmotiontrackingsystemandlistmodedataacquisition. Do you know why this is the case? Can we drop this?

record_id,title,abstract,year,label_included,title_clean,duplicate_record_id
1,Sinogram-based motion correction of PET images using optical motion tracking system and list-mode data acquisition,"A head motion during brain imaging has been recognized as a source of image degradation and introduces distortion in positron emission tomography (PET) image. There are several techniques to correct the motion artifact, but these techniques cannot correct the motion during scanning. The aim of this study is to develop a sinogram-based motion correction (SBMC) method to correct directly the head motion during PET scanning using a motion tracking system and list-mode data acquisition. This method is a rebinning procedure by which the lines of response (LOR) are geometrically transformed according to the current values of the six-dimensional motion data. Michelogram was recomposed using rebinned LOR and motion corrected sinogram was generated. In the motion corrected image, the blurring artifact due to motion was reduced by SBMC method.",2002,0,sinogrambasedmotioncorrectionofpetimagesusingopticalmotiontrackingsystemandlistmodedataacquisition,
2,A fault tolerant control architecture for automated highway systems,A hierarchical controller for dealing with faults and adverse environmental conditions on an automated highway system is proposed. The controller extends a previous control hierarchy designed to work under normal conditions of operation. The faults are classified according to the capabilities remaining on the vehicle or roadside after the fault has occurred. Information about these capabilities is used by supervisors in each of the layers of the hierarchy to select appropriate fault handling strategies. We outline the strategies needed by the supervisors and give examples of their detailed operation,2000,0,afaulttolerantcontrolarchitectureforautomatedhighwaysystems,
3,Fault tolerant memory design for HW/SW co-reliability in massively parallel computing systems,"A highly dependable embedded fault-tolerant memory architecture for high performance massively parallel computing applications and its dependability assurance techniques are proposed and discussed in this paper. The proposed fault tolerant memory provides two distinctive repair mechanisms: the permanent laser redundancy reconfiguration during the wafer probe stage in the factory to enhance its manufacturing yield and the dynamic BIST/BISD/BISR (built-in-self-test-diagnosis-repair)-based reconfiguration of the redundant resources in field to maintain high field reliability. The system reliability which is mainly determined by hardware configuration demanded by software and field reconfiguration/repair utilizing unused processor and memory modules is referred to as HW/SW Co-reliability. Various system configuration options in terms of parallel processing unit size and processor/memory intensity are also introduced and their HW/SW Co-reliability characteristics are discussed. A modeling and assurance technique for HW/SW Co-reliability with emphasis on the dependability assurance techniques based on combinatorial modeling suitable for the proposed memory design is developed and validated by extensive parametric simulations. Thereby, design and Implementation of memory-reliability-optimized and highly reliable fault-tolerant field reconfigurable massively parallel computing systems can be achieved.",2003,0,faulttolerantmemorydesignforhwswcoreliabilityinmassivelyparallelcomputingsystems,
terrymyc commented 3 years ago

the software datasets do have a strange column named title_clean.

Whoops! It was used to deduplicate but should have been removed in the output. I'll fix this.